Thread: Re: [HACKERS] Block level parallel vacuum

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Thu, Nov 30, 2017 at 11:09 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Tue, Oct 24, 2017 at 5:54 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> Yeah, I was thinking the commit is relevant with this issue but as
>> Amit mentioned this error is emitted by DROP SCHEMA CASCASE.
>> I don't find out the cause of this issue yet. With the previous
>> version patch, autovacuum workers were woking with one parallel worker
>> but it never drops relations. So it's possible that the error might
>> not have been relevant with the patch but anywayI'll continue to work
>> on that.
>
> This depends on the extension lock patch from
> https://www.postgresql.org/message-id/flat/CAD21AoCmT3cFQUN4aVvzy5chw7DuzXrJCbrjTU05B+Ss=Gn1LA@mail.gmail.com/
> if I am following correctly. So I propose to mark this patch as
> returned with feedback for now, and come back to it once the root
> problems are addressed. Feel free to correct me if you think that's
> not adapted.

I've re-designed the parallel vacuum patch. Attached the latest
version patch. As the discussion so far, this patch depends on the
extension lock patch[1]. However I think we can discuss the design
part of parallel vacuum independently from that patch. That's way I'm
proposing the new patch. In this patch, I structured and refined the
lazy_scan_heap() because it's a single big function and not suitable
for making it parallel.

The parallel vacuum worker processes keep waiting for commands from
the parallel vacuum leader process. Before entering each phase of lazy
vacuum such as scanning heap, vacuum index and vacuum heap, the leader
process changes the all workers state to the next state. Vacuum worker
processes do the job according to the their state and wait for the
next command after finished. Also in before entering the next phase,
the leader process does some preparation works while vacuum workers is
sleeping; for example, clearing shared dead tuple space before
entering the 'scanning heap' phase. The status of vacuum workers are
stored into a DSM area pointed by WorkerState variables, and
controlled by the leader process. FOr the basic design and performance
improvements please refer to my presentation at PGCon 2018[2].

The number of parallel vacuum workers is determined according to
either the table size or PARALLEL option in VACUUM command. The
maximum of parallel workers is max_parallel_maintenance_workers.

I've separated the code for vacuum worker process to
backends/commands/vacuumworker.c, and created
includes/commands/vacuum_internal.h file to declare the definitions
for the lazy vacuum.

For autovacuum, this patch allows autovacuum worker process to use the
parallel option according to the relation size or the reloption. But
autovacuum delay, since there is no slots for parallel worker of
autovacuum in AutoVacuumShmem this patch doesn't support the change of
the autovacuum delay configuration during running.

Please apply this patch with the extension lock patch[1] when testing
as this patch can try to extend visibility map pages concurrently.

[1] https://www.postgresql.org/message-id/CAD21AoBn8WbOt21MFfj1mQmL2ZD8KVgMHYrOe1F5ozsQC4Z_hw%40mail.gmail.com
[2] https://www.pgcon.org/2018/schedule/events/1202.en.html

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, Aug 14, 2018 at 9:31 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Nov 30, 2017 at 11:09 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
> > On Tue, Oct 24, 2017 at 5:54 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >> Yeah, I was thinking the commit is relevant with this issue but as
> >> Amit mentioned this error is emitted by DROP SCHEMA CASCASE.
> >> I don't find out the cause of this issue yet. With the previous
> >> version patch, autovacuum workers were woking with one parallel worker
> >> but it never drops relations. So it's possible that the error might
> >> not have been relevant with the patch but anywayI'll continue to work
> >> on that.
> >
> > This depends on the extension lock patch from
> > https://www.postgresql.org/message-id/flat/CAD21AoCmT3cFQUN4aVvzy5chw7DuzXrJCbrjTU05B+Ss=Gn1LA@mail.gmail.com/
> > if I am following correctly. So I propose to mark this patch as
> > returned with feedback for now, and come back to it once the root
> > problems are addressed. Feel free to correct me if you think that's
> > not adapted.
>
> I've re-designed the parallel vacuum patch. Attached the latest
> version patch. As the discussion so far, this patch depends on the
> extension lock patch[1]. However I think we can discuss the design
> part of parallel vacuum independently from that patch. That's way I'm
> proposing the new patch. In this patch, I structured and refined the
> lazy_scan_heap() because it's a single big function and not suitable
> for making it parallel.
>
> The parallel vacuum worker processes keep waiting for commands from
> the parallel vacuum leader process. Before entering each phase of lazy
> vacuum such as scanning heap, vacuum index and vacuum heap, the leader
> process changes the all workers state to the next state. Vacuum worker
> processes do the job according to the their state and wait for the
> next command after finished. Also in before entering the next phase,
> the leader process does some preparation works while vacuum workers is
> sleeping; for example, clearing shared dead tuple space before
> entering the 'scanning heap' phase. The status of vacuum workers are
> stored into a DSM area pointed by WorkerState variables, and
> controlled by the leader process. FOr the basic design and performance
> improvements please refer to my presentation at PGCon 2018[2].
>
> The number of parallel vacuum workers is determined according to
> either the table size or PARALLEL option in VACUUM command. The
> maximum of parallel workers is max_parallel_maintenance_workers.
>
> I've separated the code for vacuum worker process to
> backends/commands/vacuumworker.c, and created
> includes/commands/vacuum_internal.h file to declare the definitions
> for the lazy vacuum.
>
> For autovacuum, this patch allows autovacuum worker process to use the
> parallel option according to the relation size or the reloption. But
> autovacuum delay, since there is no slots for parallel worker of
> autovacuum in AutoVacuumShmem this patch doesn't support the change of
> the autovacuum delay configuration during running.
>

Attached rebased version patch to the current HEAD.

> Please apply this patch with the extension lock patch[1] when testing
> as this patch can try to extend visibility map pages concurrently.
>

Because the patch leads performance degradation in the case where
bulk-loading to a partitioned table I think that the original
proposal, which makes group locking conflict when relation extension
locks, is more realistic approach. So I worked on this with the simple
patch instead of [1]. Attached three patches:

* 0001 patch publishes some static functions such as
heap_paralellscan_startblock_init so that the parallel vacuum code can
use them.
* 0002 patch makes the group locking conflict when relation extension locks.
* 0003 patch add paralel option to lazy vacuum.

Please review them.

[1] https://www.postgresql.org/message-id/CAD21AoBn8WbOt21MFfj1mQmL2ZD8KVgMHYrOe1F5ozsQC4Z_hw%40mail.gmail.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, Oct 30, 2018 at 5:30 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Aug 14, 2018 at 9:31 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Nov 30, 2017 at 11:09 AM, Michael Paquier
> > <michael.paquier@gmail.com> wrote:
> > > On Tue, Oct 24, 2017 at 5:54 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >> Yeah, I was thinking the commit is relevant with this issue but as
> > >> Amit mentioned this error is emitted by DROP SCHEMA CASCASE.
> > >> I don't find out the cause of this issue yet. With the previous
> > >> version patch, autovacuum workers were woking with one parallel worker
> > >> but it never drops relations. So it's possible that the error might
> > >> not have been relevant with the patch but anywayI'll continue to work
> > >> on that.
> > >
> > > This depends on the extension lock patch from
> > > https://www.postgresql.org/message-id/flat/CAD21AoCmT3cFQUN4aVvzy5chw7DuzXrJCbrjTU05B+Ss=Gn1LA@mail.gmail.com/
> > > if I am following correctly. So I propose to mark this patch as
> > > returned with feedback for now, and come back to it once the root
> > > problems are addressed. Feel free to correct me if you think that's
> > > not adapted.
> >
> > I've re-designed the parallel vacuum patch. Attached the latest
> > version patch. As the discussion so far, this patch depends on the
> > extension lock patch[1]. However I think we can discuss the design
> > part of parallel vacuum independently from that patch. That's way I'm
> > proposing the new patch. In this patch, I structured and refined the
> > lazy_scan_heap() because it's a single big function and not suitable
> > for making it parallel.
> >
> > The parallel vacuum worker processes keep waiting for commands from
> > the parallel vacuum leader process. Before entering each phase of lazy
> > vacuum such as scanning heap, vacuum index and vacuum heap, the leader
> > process changes the all workers state to the next state. Vacuum worker
> > processes do the job according to the their state and wait for the
> > next command after finished. Also in before entering the next phase,
> > the leader process does some preparation works while vacuum workers is
> > sleeping; for example, clearing shared dead tuple space before
> > entering the 'scanning heap' phase. The status of vacuum workers are
> > stored into a DSM area pointed by WorkerState variables, and
> > controlled by the leader process. FOr the basic design and performance
> > improvements please refer to my presentation at PGCon 2018[2].
> >
> > The number of parallel vacuum workers is determined according to
> > either the table size or PARALLEL option in VACUUM command. The
> > maximum of parallel workers is max_parallel_maintenance_workers.
> >
> > I've separated the code for vacuum worker process to
> > backends/commands/vacuumworker.c, and created
> > includes/commands/vacuum_internal.h file to declare the definitions
> > for the lazy vacuum.
> >
> > For autovacuum, this patch allows autovacuum worker process to use the
> > parallel option according to the relation size or the reloption. But
> > autovacuum delay, since there is no slots for parallel worker of
> > autovacuum in AutoVacuumShmem this patch doesn't support the change of
> > the autovacuum delay configuration during running.
> >
>
> Attached rebased version patch to the current HEAD.
>
> > Please apply this patch with the extension lock patch[1] when testing
> > as this patch can try to extend visibility map pages concurrently.
> >
>
> Because the patch leads performance degradation in the case where
> bulk-loading to a partitioned table I think that the original
> proposal, which makes group locking conflict when relation extension
> locks, is more realistic approach. So I worked on this with the simple
> patch instead of [1]. Attached three patches:
>
> * 0001 patch publishes some static functions such as
> heap_paralellscan_startblock_init so that the parallel vacuum code can
> use them.
> * 0002 patch makes the group locking conflict when relation extension locks.
> * 0003 patch add paralel option to lazy vacuum.
>
> Please review them.
>

Oops, forgot to attach patches.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Yura Sokolov
Date:
Excuse me for being noisy.

Increasing vacuum's ring buffer improves vacuum upto 6 times.
https://www.postgresql.org/message-id/flat/20170720190405.GM1769%40tamriel.snowman.net
This is one-line change.

How much improvement parallel vacuum gives?

31.10.2018 3:23, Masahiko Sawada пишет:
> On Tue, Oct 30, 2018 at 5:30 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>> On Tue, Aug 14, 2018 at 9:31 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>
>>> On Thu, Nov 30, 2017 at 11:09 AM, Michael Paquier
>>> <michael.paquier@gmail.com> wrote:
>>>> On Tue, Oct 24, 2017 at 5:54 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>>> Yeah, I was thinking the commit is relevant with this issue but as
>>>>> Amit mentioned this error is emitted by DROP SCHEMA CASCASE.
>>>>> I don't find out the cause of this issue yet. With the previous
>>>>> version patch, autovacuum workers were woking with one parallel worker
>>>>> but it never drops relations. So it's possible that the error might
>>>>> not have been relevant with the patch but anywayI'll continue to work
>>>>> on that.
>>>>
>>>> This depends on the extension lock patch from
>>>> https://www.postgresql.org/message-id/flat/CAD21AoCmT3cFQUN4aVvzy5chw7DuzXrJCbrjTU05B+Ss=Gn1LA@mail.gmail.com/
>>>> if I am following correctly. So I propose to mark this patch as
>>>> returned with feedback for now, and come back to it once the root
>>>> problems are addressed. Feel free to correct me if you think that's
>>>> not adapted.
>>>
>>> I've re-designed the parallel vacuum patch. Attached the latest
>>> version patch. As the discussion so far, this patch depends on the
>>> extension lock patch[1]. However I think we can discuss the design
>>> part of parallel vacuum independently from that patch. That's way I'm
>>> proposing the new patch. In this patch, I structured and refined the
>>> lazy_scan_heap() because it's a single big function and not suitable
>>> for making it parallel.
>>>
>>> The parallel vacuum worker processes keep waiting for commands from
>>> the parallel vacuum leader process. Before entering each phase of lazy
>>> vacuum such as scanning heap, vacuum index and vacuum heap, the leader
>>> process changes the all workers state to the next state. Vacuum worker
>>> processes do the job according to the their state and wait for the
>>> next command after finished. Also in before entering the next phase,
>>> the leader process does some preparation works while vacuum workers is
>>> sleeping; for example, clearing shared dead tuple space before
>>> entering the 'scanning heap' phase. The status of vacuum workers are
>>> stored into a DSM area pointed by WorkerState variables, and
>>> controlled by the leader process. FOr the basic design and performance
>>> improvements please refer to my presentation at PGCon 2018[2].
>>>
>>> The number of parallel vacuum workers is determined according to
>>> either the table size or PARALLEL option in VACUUM command. The
>>> maximum of parallel workers is max_parallel_maintenance_workers.
>>>
>>> I've separated the code for vacuum worker process to
>>> backends/commands/vacuumworker.c, and created
>>> includes/commands/vacuum_internal.h file to declare the definitions
>>> for the lazy vacuum.
>>>
>>> For autovacuum, this patch allows autovacuum worker process to use the
>>> parallel option according to the relation size or the reloption. But
>>> autovacuum delay, since there is no slots for parallel worker of
>>> autovacuum in AutoVacuumShmem this patch doesn't support the change of
>>> the autovacuum delay configuration during running.
>>>
>>
>> Attached rebased version patch to the current HEAD.
>>
>>> Please apply this patch with the extension lock patch[1] when testing
>>> as this patch can try to extend visibility map pages concurrently.
>>>
>>
>> Because the patch leads performance degradation in the case where
>> bulk-loading to a partitioned table I think that the original
>> proposal, which makes group locking conflict when relation extension
>> locks, is more realistic approach. So I worked on this with the simple
>> patch instead of [1]. Attached three patches:
>>
>> * 0001 patch publishes some static functions such as
>> heap_paralellscan_startblock_init so that the parallel vacuum code can
>> use them.
>> * 0002 patch makes the group locking conflict when relation extension locks.
>> * 0003 patch add paralel option to lazy vacuum.
>>
>> Please review them.
>>
> 
> Oops, forgot to attach patches.
> 
> Regards,
> 
> --
> Masahiko Sawada
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center
> 



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
Hi,

On Thu, Nov 1, 2018 at 2:28 PM Yura Sokolov <funny.falcon@gmail.com> wrote:
>
> Excuse me for being noisy.
>
> Increasing vacuum's ring buffer improves vacuum upto 6 times.
> https://www.postgresql.org/message-id/flat/20170720190405.GM1769%40tamriel.snowman.net
> This is one-line change.
>
> How much improvement parallel vacuum gives?

It depends on hardware resources you can use.

In current design the scanning heap and vacuuming heap are procesed
with parallel workers at block level (using parallel sequential scan)
and the vacuuming indexes are also processed with parallel worker at
index-level. So even if a table is not large enough the more a table
has indexes you can get better performance. The performance test
result (I attached) I did before shows that parallel vacuum is up to
almost 10 times faster than single-process vacuum in a case. The test
used not-large table (4GB table) with many indexes but it would be
insteresting to test with large table.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Tue, Oct 30, 2018 at 2:04 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Attached rebased version patch to the current HEAD.
>
> > Please apply this patch with the extension lock patch[1] when testing
> > as this patch can try to extend visibility map pages concurrently.
> >
>
> Because the patch leads performance degradation in the case where
> bulk-loading to a partitioned table I think that the original
> proposal, which makes group locking conflict when relation extension
> locks, is more realistic approach. So I worked on this with the simple
> patch instead of [1]. Attached three patches:
>
> * 0001 patch publishes some static functions such as
> heap_paralellscan_startblock_init so that the parallel vacuum code can
> use them.
> * 0002 patch makes the group locking conflict when relation extension locks.
> * 0003 patch add paralel option to lazy vacuum.
>
> Please review them.
>

I could see that you have put a lot of effort on this patch and still
we are not able to make much progress mainly I guess because of
relation extension lock problem.  I think we can park that problem for
some time (as already we have invested quite some time on it), discuss
a bit about actual parallel vacuum patch and then come back to it.  I
don't know if that is right or not.  I am not sure we can make this
ready for PG12 timeframe, but I feel this patch deserves some
attention.  I have started reading the main parallel vacuum patch and
below are some assorted comments.

+     <para>
+      Execute <command>VACUUM</command> in parallel by <replaceable
class="parameter">N
+      </replaceable>a background workers. Collecting garbage on table
is processed
+      in block-level parallel. For tables with indexes, parallel
vacuum assigns each
+      index to each parallel vacuum worker and all garbages on a
index are processed
+      by particular parallel vacuum worker. The maximum nunber of
parallel workers
+      is <xref linkend="guc-max-parallel-workers-maintenance"/>. This
option can not
+      use with <literal>FULL</literal> option.
+     </para>

There are a couple of mistakes in above para:
(a) "..a background workers." a seems redundant.
(b) "Collecting garbage on table is processed in block-level
parallel."/"Collecting garbage on table is processed at block-level in
parallel."
(c) "For tables with indexes, parallel vacuum assigns each index to
each parallel vacuum worker and all garbages on a index are processed
by particular parallel vacuum worker."
We can rephrase it as:
"For tables with indexes, parallel vacuum assigns a worker to each
index and all garbages on a index are processed by particular that
parallel vacuum worker."
(d) Typo: nunber/number
(e) Typo: can not/cannot

I have glanced part of the patch, but didn't find any README or doc
containing the design of this patch. I think without having design in
place, it is difficult to review a patch of this size and complexity.
To start with at least explain how the work is distributed among
workers, say there are two workers which needs to vacuum a table with
four indexes, how it works?  How does the leader participate and
coordinate the work.  The other parts that you can explain how the
state is maintained during parallel vacuum, something like you are
trying to do in below function:

+ * lazy_prepare_next_state
+ *
+ * Before enter the next state prepare the next state. In parallel lazy vacuum,
+ * we must wait for the all vacuum workers to finish the previous state before
+ * preparation. Also, after prepared we change the state ot all vacuum workers
+ * and wake up them.
+ */
+static void
+lazy_prepare_next_state(LVState *lvstate, LVLeader *lvleader, int next_state)

Still other things are how the stats are shared among leader and
worker.  I can understand few things in bits and pieces while glancing
through the patch, but it would be easier to understand if you
document it at one place.  It can help reviewers to understand it.

Can you consider to split the patch so that the refactoring you have
done in current code to make it usable by parallel vacuum is a
separate patch?

+/*
+ * Vacuum all indexes. In parallel vacuum, each workers take indexes
+ * one by one. Also after vacuumed index they mark it as done. This marking
+ * is necessary to guarantee that all indexes are vacuumed based on
+ * the current collected dead tuples. The leader process continues to
+ * vacuum even if any indexes is not vacuumed completely due to failure of
+ * parallel worker for whatever reason. The mark will be checked
before entering
+ * the next state.
+ */
+void
+lazy_vacuum_all_indexes(LVState *lvstate)

I didn't understand what you want to say here.  Do you mean that
leader can continue collecting more dead tuple TIDs when workers are
vacuuming the index?  How does it deal with the errors if any during
index vacuum?

+ * plan_lazy_vacuum_workers_index_workers
+ * Use the planner to decide how many parallel worker processes
+ * VACUUM and autovacuum should request for use
+ *
+ * tableOid is the table begin vacuumed which must not be non-tables or
+ * special system tables.
..
+ plan_lazy_vacuum_workers(Oid tableOid, int nworkers_requested)

The comment starting from tableOid is not clear.  The actual function
name(plan_lazy_vacuum_workers) and name in comments
(plan_lazy_vacuum_workers_index_workers) doesn't match.  Can you take
relation as input parameter instead of taking tableOid as that can
save a lot of code in this function.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Sat, Nov 24, 2018 at 5:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Tue, Oct 30, 2018 at 2:04 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
>
> I could see that you have put a lot of effort on this patch and still
> we are not able to make much progress mainly I guess because of
> relation extension lock problem.  I think we can park that problem for
> some time (as already we have invested quite some time on it), discuss
> a bit about actual parallel vacuum patch and then come back to it.
>

Today, I was reading this and previous related thread [1] and it seems
to me multiple people Andres [2], Simon [3] have pointed out that
parallelization for index portion is more valuable.  Also, some of the
results [4] indicate the same.  Now, when there are no indexes,
parallelizing heap scans also have benefit, but I think in practice we
will see more cases where the user wants to vacuum tables with
indexes.  So how about if we break this problem in the following way
where each piece give the benefit of its own:
(a) Parallelize index scans wherein the workers will be launched only
to vacuum indexes.  Only one worker per index will be spawned.
(b) Parallelize per-index vacuum.  Each index can be vacuumed by
multiple workers.
(c) Parallelize heap scans where multiple workers will scan the heap,
collect dead TIDs and then launch multiple workers for indexes.

I think if we break this problem into multiple patches, it will reduce
the scope of each patch and help us in making progress.   Now, it's
been more than 2 years that we are trying to solve this problem, but
still didn't make much progress.  I understand there are various
genuine reasons and all of that work will help us in solving all the
problems in this area.  How about if we first target problem (a) and
once we are done with that we can see which of (b) or (c) we want to
do first?


[1] - https://www.postgresql.org/message-id/CAD21AoD1xAqp4zK-Vi1cuY3feq2oO8HcpJiz32UDUfe0BE31Xw%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/20160823164836.naody2ht6cutioiz%40alap3.anarazel.de
[3] - https://www.postgresql.org/message-id/CANP8%2BjKWOw6AAorFOjdynxUKqs6XRReOcNy-VXRFFU_4bBT8ww%40mail.gmail.com
[4] - https://www.postgresql.org/message-id/CAGTBQpbU3R_VgyWk6jaD%3D6v-Wwrm8%2B6CbrzQxQocH0fmedWRkw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Sun, Nov 25, 2018 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Nov 24, 2018 at 5:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Tue, Oct 30, 2018 at 2:04 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> >

Thank you for the comment.

> > I could see that you have put a lot of effort on this patch and still
> > we are not able to make much progress mainly I guess because of
> > relation extension lock problem.  I think we can park that problem for
> > some time (as already we have invested quite some time on it), discuss
> > a bit about actual parallel vacuum patch and then come back to it.
> >
>
> Today, I was reading this and previous related thread [1] and it seems
> to me multiple people Andres [2], Simon [3] have pointed out that
> parallelization for index portion is more valuable.  Also, some of the
> results [4] indicate the same.  Now, when there are no indexes,
> parallelizing heap scans also have benefit, but I think in practice we
> will see more cases where the user wants to vacuum tables with
> indexes.  So how about if we break this problem in the following way
> where each piece give the benefit of its own:
> (a) Parallelize index scans wherein the workers will be launched only
> to vacuum indexes.  Only one worker per index will be spawned.
> (b) Parallelize per-index vacuum.  Each index can be vacuumed by
> multiple workers.
> (c) Parallelize heap scans where multiple workers will scan the heap,
> collect dead TIDs and then launch multiple workers for indexes.
>
> I think if we break this problem into multiple patches, it will reduce
> the scope of each patch and help us in making progress.   Now, it's
> been more than 2 years that we are trying to solve this problem, but
> still didn't make much progress.  I understand there are various
> genuine reasons and all of that work will help us in solving all the
> problems in this area.  How about if we first target problem (a) and
> once we are done with that we can see which of (b) or (c) we want to
> do first?

Thank you for suggestion. It seems good to me. We would get a nice
performance scalability even by only (a), and vacuum will get more
powerful by (b) or (c). Also, (a) would not require to resovle the
relation extension lock issue IIUC. I'll change the patch and submit
to the next CF.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Mon, Nov 26, 2018 at 2:08 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sun, Nov 25, 2018 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sat, Nov 24, 2018 at 5:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > On Tue, Oct 30, 2018 at 2:04 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > >
>
> Thank you for the comment.
>
> > > I could see that you have put a lot of effort on this patch and still
> > > we are not able to make much progress mainly I guess because of
> > > relation extension lock problem.  I think we can park that problem for
> > > some time (as already we have invested quite some time on it), discuss
> > > a bit about actual parallel vacuum patch and then come back to it.
> > >
> >
> > Today, I was reading this and previous related thread [1] and it seems
> > to me multiple people Andres [2], Simon [3] have pointed out that
> > parallelization for index portion is more valuable.  Also, some of the
> > results [4] indicate the same.  Now, when there are no indexes,
> > parallelizing heap scans also have benefit, but I think in practice we
> > will see more cases where the user wants to vacuum tables with
> > indexes.  So how about if we break this problem in the following way
> > where each piece give the benefit of its own:
> > (a) Parallelize index scans wherein the workers will be launched only
> > to vacuum indexes.  Only one worker per index will be spawned.
> > (b) Parallelize per-index vacuum.  Each index can be vacuumed by
> > multiple workers.
> > (c) Parallelize heap scans where multiple workers will scan the heap,
> > collect dead TIDs and then launch multiple workers for indexes.
> >
> > I think if we break this problem into multiple patches, it will reduce
> > the scope of each patch and help us in making progress.   Now, it's
> > been more than 2 years that we are trying to solve this problem, but
> > still didn't make much progress.  I understand there are various
> > genuine reasons and all of that work will help us in solving all the
> > problems in this area.  How about if we first target problem (a) and
> > once we are done with that we can see which of (b) or (c) we want to
> > do first?
>
> Thank you for suggestion. It seems good to me. We would get a nice
> performance scalability even by only (a), and vacuum will get more
> powerful by (b) or (c). Also, (a) would not require to resovle the
> relation extension lock issue IIUC.
>

Yes, I also think so.  We do acquire 'relation extension lock' during
index vacuum, but as part of (a), we are talking one worker per-index,
so there shouldn't be a problem with respect to deadlocks.

> I'll change the patch and submit
> to the next CF.
>

Okay.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, Nov 27, 2018 at 11:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Nov 26, 2018 at 2:08 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Sun, Nov 25, 2018 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Sat, Nov 24, 2018 at 5:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > On Tue, Oct 30, 2018 at 2:04 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > >
> > > >
> >
> > Thank you for the comment.
> >
> > > > I could see that you have put a lot of effort on this patch and still
> > > > we are not able to make much progress mainly I guess because of
> > > > relation extension lock problem.  I think we can park that problem for
> > > > some time (as already we have invested quite some time on it), discuss
> > > > a bit about actual parallel vacuum patch and then come back to it.
> > > >
> > >
> > > Today, I was reading this and previous related thread [1] and it seems
> > > to me multiple people Andres [2], Simon [3] have pointed out that
> > > parallelization for index portion is more valuable.  Also, some of the
> > > results [4] indicate the same.  Now, when there are no indexes,
> > > parallelizing heap scans also have benefit, but I think in practice we
> > > will see more cases where the user wants to vacuum tables with
> > > indexes.  So how about if we break this problem in the following way
> > > where each piece give the benefit of its own:
> > > (a) Parallelize index scans wherein the workers will be launched only
> > > to vacuum indexes.  Only one worker per index will be spawned.
> > > (b) Parallelize per-index vacuum.  Each index can be vacuumed by
> > > multiple workers.
> > > (c) Parallelize heap scans where multiple workers will scan the heap,
> > > collect dead TIDs and then launch multiple workers for indexes.
> > >
> > > I think if we break this problem into multiple patches, it will reduce
> > > the scope of each patch and help us in making progress.   Now, it's
> > > been more than 2 years that we are trying to solve this problem, but
> > > still didn't make much progress.  I understand there are various
> > > genuine reasons and all of that work will help us in solving all the
> > > problems in this area.  How about if we first target problem (a) and
> > > once we are done with that we can see which of (b) or (c) we want to
> > > do first?
> >
> > Thank you for suggestion. It seems good to me. We would get a nice
> > performance scalability even by only (a), and vacuum will get more
> > powerful by (b) or (c). Also, (a) would not require to resovle the
> > relation extension lock issue IIUC.
> >
>
> Yes, I also think so.  We do acquire 'relation extension lock' during
> index vacuum, but as part of (a), we are talking one worker per-index,
> so there shouldn't be a problem with respect to deadlocks.
>
> > I'll change the patch and submit
> > to the next CF.
> >
>
> Okay.
>

Attached the updated patches. I scaled back the scope of this patch.
The patch now includes only feature (a), that is it execute both index
vacuum and cleanup index in parallel. It also doesn't include
autovacuum support for now.

The PARALLEL option works alomst same as before patch. In VACUUM
command, we can specify 'PARALLEL n' option where n is the number of
parallel workers to request. If the n is omitted the number of
parallel worekrs would be # of indexes -1. Also we can specify
parallel degree by parallel_worker reloption. The number or parallel
workers is capped by Min(# of indexes - 1,
max_maintenance_parallel_workers). That is, parallel vacuum can be
executed for a table if it has more than one indexes.

For internal design, the details are written at the top of comment in
vacuumlazy.c file. In parallel vacuum mode, we allocate DSM at the
beginning of lazy vacuum which stores shared information as well as
dead tuples. When starting either index vacuum or cleanup vacuum we
launch parallel workers. The parallel workers perform either index
vacuum or clenaup vacuum for each indexes, and then exit after done
all indexes. Then the leader process re-initialize DSM and re-launch
at the next time, not destroy parallel context here. After done lazy
vacuum, the leader process exits the parallel mode and updates index
statistics since we are not allowed any writes during parallel mode.

Also I've attached 0002 patch to support parallel lazy vacuum for
vacuumdb command.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Tue, Dec 18, 2018 at 1:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Attached the updated patches. I scaled back the scope of this patch.
> The patch now includes only feature (a), that is it execute both index
> vacuum and cleanup index in parallel. It also doesn't include
> autovacuum support for now.
>
> The PARALLEL option works alomst same as before patch. In VACUUM
> command, we can specify 'PARALLEL n' option where n is the number of
> parallel workers to request. If the n is omitted the number of
> parallel worekrs would be # of indexes -1.
>

I think for now this is okay, but I guess in furture when we make
heapscans also parallel or maybe allow more than one worker per-index
vacuum, then this won't hold good. So, I am not sure if below text in
docs is most appropriate.

+    <term><literal>PARALLEL <replaceable
class="parameter">N</replaceable></literal></term>
+    <listitem>
+     <para>
+      Execute index vacuum and cleanup index in parallel with
+      <replaceable class="parameter">N</replaceable> background
workers. If the parallel
+      degree <replaceable class="parameter">N</replaceable> is omitted,
+      <command>VACUUM</command> requests the number of indexes - 1
processes, which is the
+      maximum number of parallel vacuum workers since individual
indexes is processed by
+      one process. The actual number of parallel vacuum workers may
be less due to the
+      setting of <xref linkend="guc-max-parallel-workers-maintenance"/>.
+      This option can not use with  <literal>FULL</literal> option.

It might be better to use some generic language in docs, something
like "If the parallel degree N is omitted, then vacuum decides the
number of workers based on number of indexes on the relation which is
further limited by max-parallel-workers-maintenance".   I think you
also need to mention in some way that you consider storage option
parallel_workers.

Few assorted comments:
1.
+lazy_begin_parallel_vacuum_index(LVState *lvstate, bool for_cleanup)
{
..
+
+ LaunchParallelWorkers(lvstate->pcxt);
+
+ /*
+ * if no workers launched, we vacuum all indexes by the leader process
+ * alone. Since there is hope that we can launch workers in the next
+ * execution time we don't want to end the parallel mode yet.
+ */
+ if (lvstate->pcxt->nworkers_launched == 0)
+ return;

It is quite possible that the workers are not launched because we fail
to allocate memory, basically when pcxt->nworkers is zero.  I think in
such cases there is no use for being in parallel mode.  You can even
detect that before calling lazy_begin_parallel_vacuum_index.

2.
static void
+lazy_vacuum_all_indexes_for_leader(LVState *lvstate,
IndexBulkDeleteResult **stats,
+    LVTidMap *dead_tuples, bool do_parallel,
+    bool for_cleanup)
{
..
+ if (do_parallel)
+ lazy_begin_parallel_vacuum_index(lvstate, for_cleanup);
+
+ for (;;)
+ {
+ IndexBulkDeleteResult *r = NULL;
+
+ /*
+ * Get the next index number to vacuum and set index statistics. In parallel
+ * lazy vacuum, index bulk-deletion results are stored in the shared memory
+ * segment. If it's already updated we use it rather than setting it to NULL.
+ * In single vacuum, we can always use an element of the 'stats'.
+ */
+ if (do_parallel)
+ {
+ idx = pg_atomic_fetch_add_u32(&(lvshared->nprocessed), 1);
+
+ if (lvshared->indstats[idx].updated)
+ r = &(lvshared->indstats[idx].stats);
+ }

It is quite possible that we are not able to launch any workers in
lazy_begin_parallel_vacuum_index, in such cases, we should not use
parallel mode path, basically as written we can't rely on
'do_parallel' flag.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Thu, Dec 20, 2018 at 3:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Dec 18, 2018 at 1:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > Attached the updated patches. I scaled back the scope of this patch.
> > The patch now includes only feature (a), that is it execute both index
> > vacuum and cleanup index in parallel. It also doesn't include
> > autovacuum support for now.
> >
> > The PARALLEL option works alomst same as before patch. In VACUUM
> > command, we can specify 'PARALLEL n' option where n is the number of
> > parallel workers to request. If the n is omitted the number of
> > parallel worekrs would be # of indexes -1.
> >
>
> I think for now this is okay, but I guess in furture when we make
> heapscans also parallel or maybe allow more than one worker per-index
> vacuum, then this won't hold good. So, I am not sure if below text in
> docs is most appropriate.
>
> +    <term><literal>PARALLEL <replaceable
> class="parameter">N</replaceable></literal></term>
> +    <listitem>
> +     <para>
> +      Execute index vacuum and cleanup index in parallel with
> +      <replaceable class="parameter">N</replaceable> background
> workers. If the parallel
> +      degree <replaceable class="parameter">N</replaceable> is omitted,
> +      <command>VACUUM</command> requests the number of indexes - 1
> processes, which is the
> +      maximum number of parallel vacuum workers since individual
> indexes is processed by
> +      one process. The actual number of parallel vacuum workers may
> be less due to the
> +      setting of <xref linkend="guc-max-parallel-workers-maintenance"/>.
> +      This option can not use with  <literal>FULL</literal> option.
>
> It might be better to use some generic language in docs, something
> like "If the parallel degree N is omitted, then vacuum decides the
> number of workers based on number of indexes on the relation which is
> further limited by max-parallel-workers-maintenance".

Thank you for the review.

I agreed your concern and the text you suggested.

>  I think you
> also need to mention in some way that you consider storage option
> parallel_workers.

Added.

>
> Few assorted comments:
> 1.
> +lazy_begin_parallel_vacuum_index(LVState *lvstate, bool for_cleanup)
> {
> ..
> +
> + LaunchParallelWorkers(lvstate->pcxt);
> +
> + /*
> + * if no workers launched, we vacuum all indexes by the leader process
> + * alone. Since there is hope that we can launch workers in the next
> + * execution time we don't want to end the parallel mode yet.
> + */
> + if (lvstate->pcxt->nworkers_launched == 0)
> + return;
>
> It is quite possible that the workers are not launched because we fail
> to allocate memory, basically when pcxt->nworkers is zero.  I think in
> such cases there is no use for being in parallel mode.  You can even
> detect that before calling lazy_begin_parallel_vacuum_index.

Agreed. we can stop preparation and exit parallel mode if
pcxt->nworkers got 0 after InitializeParallelDSM() .

>
> 2.
> static void
> +lazy_vacuum_all_indexes_for_leader(LVState *lvstate,
> IndexBulkDeleteResult **stats,
> +    LVTidMap *dead_tuples, bool do_parallel,
> +    bool for_cleanup)
> {
> ..
> + if (do_parallel)
> + lazy_begin_parallel_vacuum_index(lvstate, for_cleanup);
> +
> + for (;;)
> + {
> + IndexBulkDeleteResult *r = NULL;
> +
> + /*
> + * Get the next index number to vacuum and set index statistics. In parallel
> + * lazy vacuum, index bulk-deletion results are stored in the shared memory
> + * segment. If it's already updated we use it rather than setting it to NULL.
> + * In single vacuum, we can always use an element of the 'stats'.
> + */
> + if (do_parallel)
> + {
> + idx = pg_atomic_fetch_add_u32(&(lvshared->nprocessed), 1);
> +
> + if (lvshared->indstats[idx].updated)
> + r = &(lvshared->indstats[idx].stats);
> + }
>
> It is quite possible that we are not able to launch any workers in
> lazy_begin_parallel_vacuum_index, in such cases, we should not use
> parallel mode path, basically as written we can't rely on
> 'do_parallel' flag.
>

Fixed.

Attached new version patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Fri, Dec 28, 2018 at 11:43 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Dec 20, 2018 at 3:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Dec 18, 2018 at 1:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > Attached the updated patches. I scaled back the scope of this patch.
> > > The patch now includes only feature (a), that is it execute both index
> > > vacuum and cleanup index in parallel. It also doesn't include
> > > autovacuum support for now.
> > >
> > > The PARALLEL option works alomst same as before patch. In VACUUM
> > > command, we can specify 'PARALLEL n' option where n is the number of
> > > parallel workers to request. If the n is omitted the number of
> > > parallel worekrs would be # of indexes -1.
> > >
> >
> > I think for now this is okay, but I guess in furture when we make
> > heapscans also parallel or maybe allow more than one worker per-index
> > vacuum, then this won't hold good. So, I am not sure if below text in
> > docs is most appropriate.
> >
> > +    <term><literal>PARALLEL <replaceable
> > class="parameter">N</replaceable></literal></term>
> > +    <listitem>
> > +     <para>
> > +      Execute index vacuum and cleanup index in parallel with
> > +      <replaceable class="parameter">N</replaceable> background
> > workers. If the parallel
> > +      degree <replaceable class="parameter">N</replaceable> is omitted,
> > +      <command>VACUUM</command> requests the number of indexes - 1
> > processes, which is the
> > +      maximum number of parallel vacuum workers since individual
> > indexes is processed by
> > +      one process. The actual number of parallel vacuum workers may
> > be less due to the
> > +      setting of <xref linkend="guc-max-parallel-workers-maintenance"/>.
> > +      This option can not use with  <literal>FULL</literal> option.
> >
> > It might be better to use some generic language in docs, something
> > like "If the parallel degree N is omitted, then vacuum decides the
> > number of workers based on number of indexes on the relation which is
> > further limited by max-parallel-workers-maintenance".
>
> Thank you for the review.
>
> I agreed your concern and the text you suggested.
>
> >  I think you
> > also need to mention in some way that you consider storage option
> > parallel_workers.
>
> Added.
>
> >
> > Few assorted comments:
> > 1.
> > +lazy_begin_parallel_vacuum_index(LVState *lvstate, bool for_cleanup)
> > {
> > ..
> > +
> > + LaunchParallelWorkers(lvstate->pcxt);
> > +
> > + /*
> > + * if no workers launched, we vacuum all indexes by the leader process
> > + * alone. Since there is hope that we can launch workers in the next
> > + * execution time we don't want to end the parallel mode yet.
> > + */
> > + if (lvstate->pcxt->nworkers_launched == 0)
> > + return;
> >
> > It is quite possible that the workers are not launched because we fail
> > to allocate memory, basically when pcxt->nworkers is zero.  I think in
> > such cases there is no use for being in parallel mode.  You can even
> > detect that before calling lazy_begin_parallel_vacuum_index.
>
> Agreed. we can stop preparation and exit parallel mode if
> pcxt->nworkers got 0 after InitializeParallelDSM() .
>
> >
> > 2.
> > static void
> > +lazy_vacuum_all_indexes_for_leader(LVState *lvstate,
> > IndexBulkDeleteResult **stats,
> > +    LVTidMap *dead_tuples, bool do_parallel,
> > +    bool for_cleanup)
> > {
> > ..
> > + if (do_parallel)
> > + lazy_begin_parallel_vacuum_index(lvstate, for_cleanup);
> > +
> > + for (;;)
> > + {
> > + IndexBulkDeleteResult *r = NULL;
> > +
> > + /*
> > + * Get the next index number to vacuum and set index statistics. In parallel
> > + * lazy vacuum, index bulk-deletion results are stored in the shared memory
> > + * segment. If it's already updated we use it rather than setting it to NULL.
> > + * In single vacuum, we can always use an element of the 'stats'.
> > + */
> > + if (do_parallel)
> > + {
> > + idx = pg_atomic_fetch_add_u32(&(lvshared->nprocessed), 1);
> > +
> > + if (lvshared->indstats[idx].updated)
> > + r = &(lvshared->indstats[idx].stats);
> > + }
> >
> > It is quite possible that we are not able to launch any workers in
> > lazy_begin_parallel_vacuum_index, in such cases, we should not use
> > parallel mode path, basically as written we can't rely on
> > 'do_parallel' flag.
> >
>
> Fixed.
>
> Attached new version patch.
>

Rebased.


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Haribabu Kommi
Date:

On Tue, Jan 15, 2019 at 6:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Rebased.

I started reviewing the patch, I didn't finish my review yet.
Following are some of the comments.

+    <term><literal>PARALLEL <replaceable class="parameter">N</replaceable></literal></term>
+    <listitem>
+     <para>
+      Execute index vacuum and cleanup index in parallel with

I doubt that user can understand the terms index vacuum and cleanup index.
May be it needs some more detailed information.


- VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7 /* don't skip any pages */
+ VACOPT_PARALLEL = 1 << 7, /* do lazy VACUUM in parallel */
+ VACOPT_DISABLE_PAGE_SKIPPING = 1 << 8 /* don't skip any pages */
+} VacuumOptionFlag;

Any specific reason behind not adding it as last member of the enum?


-typedef enum VacuumOption
+typedef enum VacuumOptionFlag
 {

I don't find the new name quite good, how about VacuumFlags?


+typedef struct VacuumOption
+{

How about VacuumOptions? Because this structure can contains all the
options provided to vacuum operation. 



+ vacopt1->flags |= vacopt2->flags;
+ if (vacopt2->flags == VACOPT_PARALLEL)
+ vacopt1->nworkers = vacopt2->nworkers;
+ pfree(vacopt2);
+ $$ = vacopt1;
+ }

As the above statement indicates the the last parallel number of workers 
is considered into the account, can we explain it in docs?


postgres=# vacuum (parallel 2, verbose) tbl;

With verbose, no parallel workers related information is available.
I feel giving that information is required even when it is not parallel
vacuum also.


Regards,
Haribabu Kommi
Fujitsu Australia

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Fri, Jan 18, 2019 at 10:38 AM Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
>
>
> On Tue, Jan 15, 2019 at 6:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>>
>> Rebased.
>
>
> I started reviewing the patch, I didn't finish my review yet.
> Following are some of the comments.

Thank you for reviewing the patch.

>
> +    <term><literal>PARALLEL <replaceable class="parameter">N</replaceable></literal></term>
> +    <listitem>
> +     <para>
> +      Execute index vacuum and cleanup index in parallel with
>
> I doubt that user can understand the terms index vacuum and cleanup index.
> May be it needs some more detailed information.
>

Agreed. Table 27.22 "Vacuum phases" has a good description of vacuum
phases. So maybe adding the referencint to it would work.

>
> - VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7 /* don't skip any pages */
> + VACOPT_PARALLEL = 1 << 7, /* do lazy VACUUM in parallel */
> + VACOPT_DISABLE_PAGE_SKIPPING = 1 << 8 /* don't skip any pages */
> +} VacuumOptionFlag;
>
> Any specific reason behind not adding it as last member of the enum?
>

My mistake, fixed it.

>
> -typedef enum VacuumOption
> +typedef enum VacuumOptionFlag
>  {
>
> I don't find the new name quite good, how about VacuumFlags?
>

Agreed with removing "Option" from the name but I think VacuumFlag
would be better because this enum represents only one flag. Thoughts?

>
> +typedef struct VacuumOption
> +{
>
> How about VacuumOptions? Because this structure can contains all the
> options provided to vacuum operation.
>

Agreed.

>
>
> + vacopt1->flags |= vacopt2->flags;
> + if (vacopt2->flags == VACOPT_PARALLEL)
> + vacopt1->nworkers = vacopt2->nworkers;
> + pfree(vacopt2);
> + $$ = vacopt1;
> + }
>
> As the above statement indicates the the last parallel number of workers
> is considered into the account, can we explain it in docs?
>

Agreed.

>
> postgres=# vacuum (parallel 2, verbose) tbl;
>
> With verbose, no parallel workers related information is available.
> I feel giving that information is required even when it is not parallel
> vacuum also.
>

Agreed. How about the folloiwng verbose output? I've added the number
of launched, planned and requested vacuum workers and purpose (vacuum
or cleanup).

postgres(1:91536)=# vacuum (verbose, parallel 30) test; -- table
'test' has 3 indexes
INFO:  vacuuming "public.test"
INFO:  launched 2 parallel vacuum workers for index vacuum (planned:
2, requested: 30)
INFO:  scanned index "test_idx1" to remove 2000 row versions
DETAIL:  CPU: user: 0.12 s, system: 0.00 s, elapsed: 0.12 s
INFO:  scanned index "test_idx2" to remove 2000 row versions by
parallel vacuum worker
DETAIL:  CPU: user: 0.07 s, system: 0.05 s, elapsed: 0.12 s
INFO:  scanned index "test_idx3" to remove 2000 row versions by
parallel vacuum worker
DETAIL:  CPU: user: 0.09 s, system: 0.05 s, elapsed: 0.14 s
INFO:  "test": removed 2000 row versions in 10 pages
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO:  launched 2 parallel vacuum workers for index cleanup (planned:
2, requested: 30)
INFO:  index "test_idx1" now contains 991151 row versions in 2745 pages
DETAIL:  2000 index row versions were removed.
24 index pages have been deleted, 18 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "test_idx2" now contains 991151 row versions in 2745 pages
DETAIL:  2000 index row versions were removed.
24 index pages have been deleted, 18 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "test_idx3" now contains 991151 row versions in 2745 pages
DETAIL:  2000 index row versions were removed.
24 index pages have been deleted, 18 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  "test": found 2000 removable, 367 nonremovable row versions in
41 out of 4425 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 500
There were 6849 unused item pointers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.12 s, system: 0.01 s, elapsed: 0.17 s.
VACUUM

Since the previous patch conflicts with 285d8e12 I've attached the
latest version patch that incorporated the review comment I got.




Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Haribabu Kommi
Date:

On Fri, Jan 18, 2019 at 11:42 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Jan 18, 2019 at 10:38 AM Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
>
>
> On Tue, Jan 15, 2019 at 6:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>>
>> Rebased.
>
>
> I started reviewing the patch, I didn't finish my review yet.
> Following are some of the comments.

Thank you for reviewing the patch.

>
> +    <term><literal>PARALLEL <replaceable class="parameter">N</replaceable></literal></term>
> +    <listitem>
> +     <para>
> +      Execute index vacuum and cleanup index in parallel with
>
> I doubt that user can understand the terms index vacuum and cleanup index.
> May be it needs some more detailed information.
>

Agreed. Table 27.22 "Vacuum phases" has a good description of vacuum
phases. So maybe adding the referencint to it would work.

OK.
 
>
> -typedef enum VacuumOption
> +typedef enum VacuumOptionFlag
>  {
>
> I don't find the new name quite good, how about VacuumFlags?
>

Agreed with removing "Option" from the name but I think VacuumFlag
would be better because this enum represents only one flag. Thoughts?

OK.
 

> postgres=# vacuum (parallel 2, verbose) tbl;
>
> With verbose, no parallel workers related information is available.
> I feel giving that information is required even when it is not parallel
> vacuum also.
>

Agreed. How about the folloiwng verbose output? I've added the number
of launched, planned and requested vacuum workers and purpose (vacuum
or cleanup).

postgres(1:91536)=# vacuum (verbose, parallel 30) test; -- table
'test' has 3 indexes
INFO:  vacuuming "public.test"
INFO:  launched 2 parallel vacuum workers for index vacuum (planned:
2, requested: 30)
INFO:  scanned index "test_idx1" to remove 2000 row versions
DETAIL:  CPU: user: 0.12 s, system: 0.00 s, elapsed: 0.12 s
INFO:  scanned index "test_idx2" to remove 2000 row versions by
parallel vacuum worker
DETAIL:  CPU: user: 0.07 s, system: 0.05 s, elapsed: 0.12 s
INFO:  scanned index "test_idx3" to remove 2000 row versions by
parallel vacuum worker
DETAIL:  CPU: user: 0.09 s, system: 0.05 s, elapsed: 0.14 s
INFO:  "test": removed 2000 row versions in 10 pages
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO:  launched 2 parallel vacuum workers for index cleanup (planned:
2, requested: 30)
INFO:  index "test_idx1" now contains 991151 row versions in 2745 pages
DETAIL:  2000 index row versions were removed.
24 index pages have been deleted, 18 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "test_idx2" now contains 991151 row versions in 2745 pages
DETAIL:  2000 index row versions were removed.
24 index pages have been deleted, 18 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "test_idx3" now contains 991151 row versions in 2745 pages
DETAIL:  2000 index row versions were removed.
24 index pages have been deleted, 18 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  "test": found 2000 removable, 367 nonremovable row versions in
41 out of 4425 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 500
There were 6849 unused item pointers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.12 s, system: 0.01 s, elapsed: 0.17 s.
VACUUM
 
The verbose output is good.

Since the previous patch conflicts with 285d8e12 I've attached the
latest version patch that incorporated the review comment I got.

Thanks for the latest patch. I have some more minor comments.

+      Execute index vacuum and cleanup index in parallel with

Better to use vacuum index and cleanup index? This is in same with
the description of vacuum phases. It is better to follow same notation
in the patch.


+ dead_tuples = lazy_space_alloc(lvstate, nblocks, parallel_workers);

With the change, the lazy_space_alloc takes care of initializing the
parallel vacuum, can we write something related to that in the comments.


+ initprog_val[2] = dead_tuples->max_dead_tuples;

dead_tuples variable may need rename for better reading?



+ if (lvshared->indstats[idx].updated)
+ result = &(lvshared->indstats[idx].stats);
+ else
+ copy_result = true;


I don't see a need for copy_result variable, how about directly using
the updated flag to decide whether to copy or not? Once the result is
copied update the flag.


+use Test::More tests => 34;

I don't find any new tetst are added in this patch.

I am thinking of performance penalty if we use the parallel option of
vacuum on a small sized table? 

Regards,
Haribabu Kommi
Fujitsu Australia

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, Jan 22, 2019 at 9:59 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>
>
> Thanks for the latest patch. I have some more minor comments.

Thank you for reviewing the patch.

>
> +      Execute index vacuum and cleanup index in parallel with
>
> Better to use vacuum index and cleanup index? This is in same with
> the description of vacuum phases. It is better to follow same notation
> in the patch.

Agreed. I've changed it to "Vacuum index and cleanup index in parallel
with ...".

>
>
> + dead_tuples = lazy_space_alloc(lvstate, nblocks, parallel_workers);
>
> With the change, the lazy_space_alloc takes care of initializing the
> parallel vacuum, can we write something related to that in the comments.
>

Agreed.

>
> + initprog_val[2] = dead_tuples->max_dead_tuples;
>
> dead_tuples variable may need rename for better reading?
>

I might not get your comment correctly but I've tried to fix it.
Please review it.

>
>
> + if (lvshared->indstats[idx].updated)
> + result = &(lvshared->indstats[idx].stats);
> + else
> + copy_result = true;
>
>
> I don't see a need for copy_result variable, how about directly using
> the updated flag to decide whether to copy or not? Once the result is
> copied update the flag.
>

You're right. Fixed.

>
> +use Test::More tests => 34;
>
> I don't find any new tetst are added in this patch.

Fixed.

>
> I am thinking of performance penalty if we use the parallel option of
> vacuum on a small sized table?

Hm, unlike other parallel operations other than ParallelAppend the
parallel vacuum executes multiple index vacuum simultaneously.
Therefore this can avoid contension. I think there is a performance
penalty but it would not be big.

Attached the latest patches.




Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Haribabu Kommi
Date:

On Thu, Jan 24, 2019 at 1:16 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Attached the latest patches.

Thanks for the updated patches.
Some more code review comments.

+         started by a single utility command.  Currently, the parallel
+         utility commands that support the use of parallel workers are
+         <command>CREATE INDEX</command> and <command>VACUUM</command>
+         without <literal>FULL</literal> option, and only when building
+         a B-tree index.  Parallel workers are taken from the pool of


I feel the above sentence may not give the proper picture, how about the 
adding following modification?

<command>CREATE INDEX</command> only when building a B-tree index 
and <command>VACUUM</command> without <literal>FULL</literal> option.



+ * parallel vacuum, we perform both index vacuum and index cleanup in parallel.
+ * Individual indexes is processed by one vacuum process. At beginning of

How about vacuum index and cleanup index similar like other places?


+ * memory space for dead tuples. When starting either index vacuum or cleanup
+ * vacuum, we launch parallel worker processes. Once all indexes are processed

same here as well?


+ * Before starting parallel index vacuum and parallel cleanup index we launch
+ * parallel workers. All parallel workers will exit after processed all indexes

parallel vacuum index and parallel cleanup index?


+ /*
+ * If there is already-updated result in the shared memory we
+ * use it. Otherwise we pass NULL to index AMs and copy the
+ * result to the shared memory segment.
+ */
+ if (lvshared->indstats[idx].updated)
+ result = &(lvshared->indstats[idx].stats);

I didn't really find a need of the flag to differentiate the stats pointer from
first run to second run? I don't see any problem in passing directing the stats
and the same stats are updated in the worker side and leader side. Anyway no two
processes will do the index vacuum at same time. Am I missing something?

Even if this flag is to identify whether the stats are updated or not before
writing them, I don't see a need of it compared to normal vacuum.


+ * Enter the parallel mode, allocate and initialize a DSM segment. Return
+ * the memory space for storing dead tuples or NULL if no workers are prepared.
+ */

+ pcxt = CreateParallelContext("postgres", "heap_parallel_vacuum_main",
+ request, true);

But we are passing as serializable_okay flag as true, means it doesn't return
NULL. Is it expected?


+ initStringInfo(&buf);
+ appendStringInfo(&buf,
+ ngettext("launched %d parallel vacuum worker %s (planned: %d",
+   "launched %d parallel vacuum workers %s (planned: %d",
+   lvstate->pcxt->nworkers_launched),
+ lvstate->pcxt->nworkers_launched,
+ for_cleanup ? "for index cleanup" : "for index vacuum",
+ lvstate->pcxt->nworkers);
+ if (lvstate->options.nworkers > 0)
+ appendStringInfo(&buf, ", requested %d", lvstate->options.nworkers);

what is the difference between planned workers and requested workers, aren't both
are same?


- COMPARE_SCALAR_FIELD(options);
- COMPARE_NODE_FIELD(rels);
+ if (a->options.flags != b->options.flags)
+ return false;
+ if (a->options.nworkers != b->options.nworkers)
+ return false;

Options is changed from SCALAR to check, but why the rels check is removed?
The options is changed from int to a structure so using SCALAR may not work
in other function like _copyVacuumStmt and etc?


+typedef struct VacuumOptions
+{
+ VacuumFlag flags; /* OR of VacuumFlag */
+ int nworkers; /* # of parallel vacuum workers */
+} VacuumOptions;


Do we need to add NodeTag for the above structure? Because this structure is
part of VacuumStmt structure.


+        <application>vacuumdb</application> will require background workers,
+        so make sure your <xref linkend="guc-max-parallel-workers-maintenance"/>
+        setting is more than one.

removing vacuumdb and changing it as "This option will ..."? 

I will continue the testing of this patch and share the details. 

Regards,
Haribabu Kommi
Fujitsu Australia

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Wed, Jan 30, 2019 at 2:06 AM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>
>
> On Thu, Jan 24, 2019 at 1:16 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>>
>> Attached the latest patches.
>
>
> Thanks for the updated patches.
> Some more code review comments.
>

Thank you!

> +         started by a single utility command.  Currently, the parallel
> +         utility commands that support the use of parallel workers are
> +         <command>CREATE INDEX</command> and <command>VACUUM</command>
> +         without <literal>FULL</literal> option, and only when building
> +         a B-tree index.  Parallel workers are taken from the pool of
>
>
> I feel the above sentence may not give the proper picture, how about the
> adding following modification?
>
> <command>CREATE INDEX</command> only when building a B-tree index
> and <command>VACUUM</command> without <literal>FULL</literal> option.
>
>

Agreed.

>
> + * parallel vacuum, we perform both index vacuum and index cleanup in parallel.
> + * Individual indexes is processed by one vacuum process. At beginning of
>
> How about vacuum index and cleanup index similar like other places?
>
>
> + * memory space for dead tuples. When starting either index vacuum or cleanup
> + * vacuum, we launch parallel worker processes. Once all indexes are processed
>
> same here as well?
>
>
> + * Before starting parallel index vacuum and parallel cleanup index we launch
> + * parallel workers. All parallel workers will exit after processed all indexes
>
> parallel vacuum index and parallel cleanup index?
>
>

ISTM we're using like "index vacuuming", "index cleanup" and "FSM
vacuming" in vacuumlazy.c so maybe "parallel index vacuuming" and
"parallel index cleanup" would be better?

> + /*
> + * If there is already-updated result in the shared memory we
> + * use it. Otherwise we pass NULL to index AMs and copy the
> + * result to the shared memory segment.
> + */
> + if (lvshared->indstats[idx].updated)
> + result = &(lvshared->indstats[idx].stats);
>
> I didn't really find a need of the flag to differentiate the stats pointer from
> first run to second run? I don't see any problem in passing directing the stats
> and the same stats are updated in the worker side and leader side. Anyway no two
> processes will do the index vacuum at same time. Am I missing something?
>
> Even if this flag is to identify whether the stats are updated or not before
> writing them, I don't see a need of it compared to normal vacuum.
>

The passing stats = NULL to amvacuumcleanup and ambulkdelete means the
first time execution. For example, btvacuumcleanup skips cleanup if
it's not NULL.In the normal vacuum we pass NULL to ambulkdelete or
amvacuumcleanup when the first time calling. And they store the result
stats to the memory allocated int the local memory. Therefore in the
parallel vacuum I think that both worker and leader need to move it to
the shared memory and mark it as updated as different worker could
vacuum different indexes at the next time.

>
> + * Enter the parallel mode, allocate and initialize a DSM segment. Return
> + * the memory space for storing dead tuples or NULL if no workers are prepared.
> + */
>
> + pcxt = CreateParallelContext("postgres", "heap_parallel_vacuum_main",
> + request, true);
>
> But we are passing as serializable_okay flag as true, means it doesn't return
> NULL. Is it expected?
>
>

I think you're right. Since the request never be 0 and
serializable_okey is true it should not return NULL. Will fix.

> + initStringInfo(&buf);
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker %s (planned: %d",
> +   "launched %d parallel vacuum workers %s (planned: %d",
> +   lvstate->pcxt->nworkers_launched),
> + lvstate->pcxt->nworkers_launched,
> + for_cleanup ? "for index cleanup" : "for index vacuum",
> + lvstate->pcxt->nworkers);
> + if (lvstate->options.nworkers > 0)
> + appendStringInfo(&buf, ", requested %d", lvstate->options.nworkers);
>
> what is the difference between planned workers and requested workers, aren't both
> are same?

The request is the parallel degree that is specified explicitly by
user whereas the planned is the actual number we planned based on the
number of indexes the table has. For example, if we do like 'VACUUM
(PARALLEL 3000) tbl' where the tbl has 4 indexes, the request is 3000
and the planned is 4. Also if max_parallel_maintenance_workers is 2
the planned is 2.

>
>
> - COMPARE_SCALAR_FIELD(options);
> - COMPARE_NODE_FIELD(rels);
> + if (a->options.flags != b->options.flags)
> + return false;
> + if (a->options.nworkers != b->options.nworkers)
> + return false;
>
> Options is changed from SCALAR to check, but why the rels check is removed?
> The options is changed from int to a structure so using SCALAR may not work
> in other function like _copyVacuumStmt and etc?

Agreed and will fix.

>
> +typedef struct VacuumOptions
> +{
> + VacuumFlag flags; /* OR of VacuumFlag */
> + int nworkers; /* # of parallel vacuum workers */
> +} VacuumOptions;
>
>
> Do we need to add NodeTag for the above structure? Because this structure is
> part of VacuumStmt structure.

Yes, I will add it.

>
>
> +        <application>vacuumdb</application> will require background workers,
> +        so make sure your <xref linkend="guc-max-parallel-workers-maintenance"/>
> +        setting is more than one.
>
> removing vacuumdb and changing it as "This option will ..."?
>
Agreed.

> I will continue the testing of this patch and share the details.
>

Thank you. I'll submit the updated patch set.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Fri, Feb 1, 2019 at 2:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Jan 30, 2019 at 2:06 AM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>
> Thank you. I'll submit the updated patch set.
>

I don't see any chance of getting this committed in the next few days,
so, moved to next CF.   Thanks for working on this and I hope you will
continue work on this project.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Sat, Feb 2, 2019 at 4:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Feb 1, 2019 at 2:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Jan 30, 2019 at 2:06 AM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
> >
> > Thank you. I'll submit the updated patch set.
> >
>
> I don't see any chance of getting this committed in the next few days,
> so, moved to next CF.   Thanks for working on this and I hope you will
> continue work on this project.

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Thu, Jan 31, 2019 at 10:18 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Thank you. I'll submit the updated patch set.
>

Attached the latest patch set.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Haribabu Kommi
Date:

On Fri, Feb 1, 2019 at 8:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Jan 30, 2019 at 2:06 AM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>
>
>
>
> + * Before starting parallel index vacuum and parallel cleanup index we launch
> + * parallel workers. All parallel workers will exit after processed all indexes
>
> parallel vacuum index and parallel cleanup index?
>
>

ISTM we're using like "index vacuuming", "index cleanup" and "FSM
vacuming" in vacuumlazy.c so maybe "parallel index vacuuming" and
"parallel index cleanup" would be better?

OK.
 
> + /*
> + * If there is already-updated result in the shared memory we
> + * use it. Otherwise we pass NULL to index AMs and copy the
> + * result to the shared memory segment.
> + */
> + if (lvshared->indstats[idx].updated)
> + result = &(lvshared->indstats[idx].stats);
>
> I didn't really find a need of the flag to differentiate the stats pointer from
> first run to second run? I don't see any problem in passing directing the stats
> and the same stats are updated in the worker side and leader side. Anyway no two
> processes will do the index vacuum at same time. Am I missing something?
>
> Even if this flag is to identify whether the stats are updated or not before
> writing them, I don't see a need of it compared to normal vacuum.
>

The passing stats = NULL to amvacuumcleanup and ambulkdelete means the
first time execution. For example, btvacuumcleanup skips cleanup if
it's not NULL.In the normal vacuum we pass NULL to ambulkdelete or
amvacuumcleanup when the first time calling. And they store the result
stats to the memory allocated int the local memory. Therefore in the
parallel vacuum I think that both worker and leader need to move it to
the shared memory and mark it as updated as different worker could
vacuum different indexes at the next time.

OK, understood the point. But for btbulkdelete whenever the stats are NULL,
it allocates the memory. So I don't see a problem with it. 

The only problem is with btvacuumcleanup, when there are no dead tuples
present in the table, the btbulkdelete is not called and directly the btvacuumcleanup
is called at the end of vacuum, in that scenario, there is code flow difference
based on the stats. so why can't we use the deadtuples number to differentiate
instead of adding another flag? And also this scenario is not very often, so avoiding
memcpy for normal operations would be better. It may be a small gain, just 
thought of it.
 

> + initStringInfo(&buf);
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker %s (planned: %d",
> +   "launched %d parallel vacuum workers %s (planned: %d",
> +   lvstate->pcxt->nworkers_launched),
> + lvstate->pcxt->nworkers_launched,
> + for_cleanup ? "for index cleanup" : "for index vacuum",
> + lvstate->pcxt->nworkers);
> + if (lvstate->options.nworkers > 0)
> + appendStringInfo(&buf, ", requested %d", lvstate->options.nworkers);
>
> what is the difference between planned workers and requested workers, aren't both
> are same?

The request is the parallel degree that is specified explicitly by
user whereas the planned is the actual number we planned based on the
number of indexes the table has. For example, if we do like 'VACUUM
(PARALLEL 3000) tbl' where the tbl has 4 indexes, the request is 3000
and the planned is 4. Also if max_parallel_maintenance_workers is 2
the planned is 2.

OK.

Regards,
Haribabu Kommi
Fujitsu Australia

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, Feb 5, 2019 at 12:14 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>
>
> On Fri, Feb 1, 2019 at 8:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>>
>> The passing stats = NULL to amvacuumcleanup and ambulkdelete means the
>> first time execution. For example, btvacuumcleanup skips cleanup if
>> it's not NULL.In the normal vacuum we pass NULL to ambulkdelete or
>> amvacuumcleanup when the first time calling. And they store the result
>> stats to the memory allocated int the local memory. Therefore in the
>> parallel vacuum I think that both worker and leader need to move it to
>> the shared memory and mark it as updated as different worker could
>> vacuum different indexes at the next time.
>
>
> OK, understood the point. But for btbulkdelete whenever the stats are NULL,
> it allocates the memory. So I don't see a problem with it.
>
> The only problem is with btvacuumcleanup, when there are no dead tuples
> present in the table, the btbulkdelete is not called and directly the btvacuumcleanup
> is called at the end of vacuum, in that scenario, there is code flow difference
> based on the stats. so why can't we use the deadtuples number to differentiate
> instead of adding another flag?

I don't understand your suggestion. What do we compare deadtuples
number to? Could you elaborate on that please?

> And also this scenario is not very often, so avoiding
> memcpy for normal operations would be better. It may be a small gain, just
> thought of it.
>

This scenario could happen periodically on an insert-only table.
Additional memcpy is executed once per indexes in a vacuuming but I
agree that the avoiding memcpy would be good.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

From
Haribabu Kommi
Date:

On Sat, Feb 9, 2019 at 11:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, Feb 5, 2019 at 12:14 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>
>
> On Fri, Feb 1, 2019 at 8:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>>
>> The passing stats = NULL to amvacuumcleanup and ambulkdelete means the
>> first time execution. For example, btvacuumcleanup skips cleanup if
>> it's not NULL.In the normal vacuum we pass NULL to ambulkdelete or
>> amvacuumcleanup when the first time calling. And they store the result
>> stats to the memory allocated int the local memory. Therefore in the
>> parallel vacuum I think that both worker and leader need to move it to
>> the shared memory and mark it as updated as different worker could
>> vacuum different indexes at the next time.
>
>
> OK, understood the point. But for btbulkdelete whenever the stats are NULL,
> it allocates the memory. So I don't see a problem with it.
>
> The only problem is with btvacuumcleanup, when there are no dead tuples
> present in the table, the btbulkdelete is not called and directly the btvacuumcleanup
> is called at the end of vacuum, in that scenario, there is code flow difference
> based on the stats. so why can't we use the deadtuples number to differentiate
> instead of adding another flag?

I don't understand your suggestion. What do we compare deadtuples
number to? Could you elaborate on that please?

The scenario where the stats should pass NULL to btvacuumcleanup function is
when there no dead tuples, I just think that we may use that deadtuples structure
to find out whether stats should pass NULL or not while avoiding the extra
memcpy.
 
> And also this scenario is not very often, so avoiding
> memcpy for normal operations would be better. It may be a small gain, just
> thought of it.
>

This scenario could happen periodically on an insert-only table.
Additional memcpy is executed once per indexes in a vacuuming but I
agree that the avoiding memcpy would be good.

Yes, understood. If possible removing the need of memcpy would be good.
The latest patch doesn't apply anymore. Needs a rebase.

Regards,
Haribabu Kommi
Fujitsu Australia

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Wed, Feb 13, 2019 at 9:32 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>
>
> On Sat, Feb 9, 2019 at 11:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>> On Tue, Feb 5, 2019 at 12:14 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>> >
>> >
>> > On Fri, Feb 1, 2019 at 8:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> >>
>> >>
>> >> The passing stats = NULL to amvacuumcleanup and ambulkdelete means the
>> >> first time execution. For example, btvacuumcleanup skips cleanup if
>> >> it's not NULL.In the normal vacuum we pass NULL to ambulkdelete or
>> >> amvacuumcleanup when the first time calling. And they store the result
>> >> stats to the memory allocated int the local memory. Therefore in the
>> >> parallel vacuum I think that both worker and leader need to move it to
>> >> the shared memory and mark it as updated as different worker could
>> >> vacuum different indexes at the next time.
>> >
>> >
>> > OK, understood the point. But for btbulkdelete whenever the stats are NULL,
>> > it allocates the memory. So I don't see a problem with it.
>> >
>> > The only problem is with btvacuumcleanup, when there are no dead tuples
>> > present in the table, the btbulkdelete is not called and directly the btvacuumcleanup
>> > is called at the end of vacuum, in that scenario, there is code flow difference
>> > based on the stats. so why can't we use the deadtuples number to differentiate
>> > instead of adding another flag?
>>
>> I don't understand your suggestion. What do we compare deadtuples
>> number to? Could you elaborate on that please?
>
>
> The scenario where the stats should pass NULL to btvacuumcleanup function is
> when there no dead tuples, I just think that we may use that deadtuples structure
> to find out whether stats should pass NULL or not while avoiding the extra
> memcpy.
>

Thank you for your explanation. I understood. Maybe I'm worrying too
much but I'm concernced compatibility; currently we handle indexes
individually. So if there is an index access method whose ambulkdelete
returns NULL at the first call but returns a palloc'd struct at the
second time or other, that doesn't work fine.

The documentation says that passed-in 'stats' is NULL at the first
time call of ambulkdelete but doesn't say about the second time or
more. Index access methods may expect that the passed-in 'stats'  is
the same as what they has returned last time. So I think to add an
extra flag for keeping comptibility.

>>
>> > And also this scenario is not very often, so avoiding
>> > memcpy for normal operations would be better. It may be a small gain, just
>> > thought of it.
>> >
>>
>> This scenario could happen periodically on an insert-only table.
>> Additional memcpy is executed once per indexes in a vacuuming but I
>> agree that the avoiding memcpy would be good.
>
>
> Yes, understood. If possible removing the need of memcpy would be good.
> The latest patch doesn't apply anymore. Needs a rebase.
>

Thank you. Attached the rebased patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Haribabu Kommi
Date:

On Thu, Feb 14, 2019 at 9:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Feb 13, 2019 at 9:32 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>
>
> On Sat, Feb 9, 2019 at 11:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>> On Tue, Feb 5, 2019 at 12:14 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>> >
>> >
>> > On Fri, Feb 1, 2019 at 8:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> >>
>> >>
>> >> The passing stats = NULL to amvacuumcleanup and ambulkdelete means the
>> >> first time execution. For example, btvacuumcleanup skips cleanup if
>> >> it's not NULL.In the normal vacuum we pass NULL to ambulkdelete or
>> >> amvacuumcleanup when the first time calling. And they store the result
>> >> stats to the memory allocated int the local memory. Therefore in the
>> >> parallel vacuum I think that both worker and leader need to move it to
>> >> the shared memory and mark it as updated as different worker could
>> >> vacuum different indexes at the next time.
>> >
>> >
>> > OK, understood the point. But for btbulkdelete whenever the stats are NULL,
>> > it allocates the memory. So I don't see a problem with it.
>> >
>> > The only problem is with btvacuumcleanup, when there are no dead tuples
>> > present in the table, the btbulkdelete is not called and directly the btvacuumcleanup
>> > is called at the end of vacuum, in that scenario, there is code flow difference
>> > based on the stats. so why can't we use the deadtuples number to differentiate
>> > instead of adding another flag?
>>
>> I don't understand your suggestion. What do we compare deadtuples
>> number to? Could you elaborate on that please?
>
>
> The scenario where the stats should pass NULL to btvacuumcleanup function is
> when there no dead tuples, I just think that we may use that deadtuples structure
> to find out whether stats should pass NULL or not while avoiding the extra
> memcpy.
>

Thank you for your explanation. I understood. Maybe I'm worrying too
much but I'm concernced compatibility; currently we handle indexes
individually. So if there is an index access method whose ambulkdelete
returns NULL at the first call but returns a palloc'd struct at the
second time or other, that doesn't work fine.

The documentation says that passed-in 'stats' is NULL at the first
time call of ambulkdelete but doesn't say about the second time or
more. Index access methods may expect that the passed-in 'stats'  is
the same as what they has returned last time. So I think to add an
extra flag for keeping comptibility.

I checked some of the ambulkdelete functions, and they are not returning
a NULL data whenever those functions are called. But the palloc'd structure
doesn't get filled with the details.

IMO, there is no need of any extra code that is required for parallel vacuum
compared to normal vacuum.

Regards,
Haribabu Kommi
Fujitsu Australia

Re: [HACKERS] Block level parallel vacuum

From
Haribabu Kommi
Date:
On Thu, Feb 14, 2019 at 9:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Thank you. Attached the rebased patch.

I ran some performance tests to compare the parallelism benefits,
but I got some strange results of performance overhead, may be it is
because, I tested it on my laptop.

FYI,

Table schema:

create table tbl(f1 int, f2 char(100), f3 float4, f4 char(100), f5 float8, f6 char(100), f7 bigint);


Tbl with 3 indexes

1000 record deletion
master - 22ms
patch - 25ms with 0 parallel workers
patch - 43ms with 1 parallel worker
patch - 72ms with 2 parallel workers


10000 record deletion
master - 52ms
patch - 56ms with 0 parallel workers
patch - 79ms with 1 parallel worker
patch - 86ms with 2 parallel workers


100000 record deletion
master - 410ms
patch - 379ms with 0 parallel workers
patch - 330ms with 1 parallel worker
patch - 289ms with 2 parallel workers


Tbl with 5 indexes

1000 record deletion
master - 28ms
patch - 34ms with 0 parallel workers
patch - 86ms with 2 parallel workers
patch - 106ms with 4 parallel workers


10000 record deletion
master - 58ms
patch - 63ms with 0 parallel workers
patch - 101ms with 2 parallel workers
patch - 118ms with 4 parallel workers


100000 record deletion
master - 632ms
patch - 490ms with 0 parallel workers
patch - 455ms with 2 parallel workers
patch - 403ms with 4 parallel workers



Tbl with 7 indexes

1000 record deletion
master - 35ms
patch - 44ms with 0 parallel workers
patch - 93ms with 2 parallel workers
patch - 110ms with 4 parallel workers
patch - 123ms with 6 parallel workers

10000 record deletion
master - 76ms
patch - 78ms with 0 parallel workers
patch - 135ms with 2 parallel workers
patch - 143ms with 4 parallel workers
patch - 139ms with 6 parallel workers

100000 record deletion
master - 641ms
patch - 656ms with 0 parallel workers
patch - 613ms with 2 parallel workers
patch - 735ms with 4 parallel workers
patch - 679ms with 6 parallel workers


Regards,
Haribabu Kommi
Fujitsu Australia

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, Feb 26, 2019 at 1:35 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>
> On Thu, Feb 14, 2019 at 9:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>> Thank you. Attached the rebased patch.
>
>
> I ran some performance tests to compare the parallelism benefits,

Thank you for testing!

> but I got some strange results of performance overhead, may be it is
> because, I tested it on my laptop.

Hmm, I think the parallel vacuum would help for heavy workloads like a
big table with multiple indexes. In your test result, all executions
are completed within 1 sec, which seems to be one use case that the
parallel vacuum wouldn't help. I suspect that the table is small,
right? Anyway I'll also do performance tests.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Sat, Feb 23, 2019 at 10:28 PM Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
>
>
> On Thu, Feb 14, 2019 at 9:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>> On Wed, Feb 13, 2019 at 9:32 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>> >
>> >
>> > On Sat, Feb 9, 2019 at 11:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> >>
>> >> On Tue, Feb 5, 2019 at 12:14 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>> >> >
>> >> >
>> >> > On Fri, Feb 1, 2019 at 8:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> >> >>
>> >> >>
>> >> >> The passing stats = NULL to amvacuumcleanup and ambulkdelete means the
>> >> >> first time execution. For example, btvacuumcleanup skips cleanup if
>> >> >> it's not NULL.In the normal vacuum we pass NULL to ambulkdelete or
>> >> >> amvacuumcleanup when the first time calling. And they store the result
>> >> >> stats to the memory allocated int the local memory. Therefore in the
>> >> >> parallel vacuum I think that both worker and leader need to move it to
>> >> >> the shared memory and mark it as updated as different worker could
>> >> >> vacuum different indexes at the next time.
>> >> >
>> >> >
>> >> > OK, understood the point. But for btbulkdelete whenever the stats are NULL,
>> >> > it allocates the memory. So I don't see a problem with it.
>> >> >
>> >> > The only problem is with btvacuumcleanup, when there are no dead tuples
>> >> > present in the table, the btbulkdelete is not called and directly the btvacuumcleanup
>> >> > is called at the end of vacuum, in that scenario, there is code flow difference
>> >> > based on the stats. so why can't we use the deadtuples number to differentiate
>> >> > instead of adding another flag?
>> >>
>> >> I don't understand your suggestion. What do we compare deadtuples
>> >> number to? Could you elaborate on that please?
>> >
>> >
>> > The scenario where the stats should pass NULL to btvacuumcleanup function is
>> > when there no dead tuples, I just think that we may use that deadtuples structure
>> > to find out whether stats should pass NULL or not while avoiding the extra
>> > memcpy.
>> >
>>
>> Thank you for your explanation. I understood. Maybe I'm worrying too
>> much but I'm concernced compatibility; currently we handle indexes
>> individually. So if there is an index access method whose ambulkdelete
>> returns NULL at the first call but returns a palloc'd struct at the
>> second time or other, that doesn't work fine.
>>
>> The documentation says that passed-in 'stats' is NULL at the first
>> time call of ambulkdelete but doesn't say about the second time or
>> more. Index access methods may expect that the passed-in 'stats'  is
>> the same as what they has returned last time. So I think to add an
>> extra flag for keeping comptibility.
>
>
> I checked some of the ambulkdelete functions, and they are not returning
> a NULL data whenever those functions are called. But the palloc'd structure
> doesn't get filled with the details.
>
> IMO, there is no need of any extra code that is required for parallel vacuum
> compared to normal vacuum.
>

Hmm, I think that this code is necessary to faithfully keep the same
index vacuum behavior, especially for communication between lazy
vacuum and IAMs, as it is. The IAMs in postgres don't worry about that
but other third party AMs might not, and it might be developed in the
future. On the other hand, I can understand your concerns; if such IAM
is quite rare we might not need to make the code complicated
needlessly. I'd like to hear more opinions also from other hackers.


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

From
Robert Haas
Date:
On Thu, Feb 14, 2019 at 5:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Thank you. Attached the rebased patch.

Here are some review comments.

+         started by a single utility command.  Currently, the parallel
+         utility commands that support the use of parallel workers are
+         <command>CREATE INDEX</command> and <command>VACUUM</command>
+         without <literal>FULL</literal> option, and only when building
+         a B-tree index.  Parallel workers are taken from the pool of

That sentence is garbled.  The end part about b-tree indexes applies
only to CREATE INDEX, not to VACUUM, since VACUUM does build indexes.

+      Vacuum index and cleanup index in parallel
+      <replaceable class="parameter">N</replaceable> background
workers (for the detail
+      of each vacuum phases, please refer to <xref
linkend="vacuum-phases"/>. If the

I have two problems with this.  One is that I can't understand the
English very well. I think you mean something like: "Perform the
'vacuum index' and 'cleanup index' phases of VACUUM in parallel using
N background workers," but I'm not entirely sure.  The other is that
if that is what you mean, I don't think it's a sufficient description.
Users need to understand whether, for example, only one worker can be
used per index, or whether the work for a single index can be split
across workers.

+      parallel degree <replaceable class="parameter">N</replaceable>
is omitted,
+      then <command>VACUUM</command> decides the number of workers based on
+      number of indexes on the relation which further limited by
+      <xref linkend="guc-max-parallel-workers-maintenance"/>. Also if
this option

Now this makes it sound like it's one worker per index, but you could
be more explicit about it.

+      is specified multile times, the last parallel degree
+      <replaceable class="parameter">N</replaceable> is considered
into the account.

Typo, but I'd just delete this sentence altogether; the behavior if
the option is multiply specified seems like a triviality that need not
be documented.

+    Setting a value for <literal>parallel_workers</literal> via
+    <xref linkend="sql-altertable"/> also controls how many parallel
+    worker processes will be requested by a <command>VACUUM</command>
+    against the table. This setting is overwritten by setting
+    <replaceable class="parameter">N</replaceable> of
<literal>PARALLEL</literal>
+    option.

I wonder if we really want this behavior.  Should a setting that
controls the degree of parallelism when scanning the table also affect
VACUUM?  I tend to think that we probably don't ever want VACUUM of a
table to be parallel by default, but rather something that the user
must explicitly request.  Happy to hear other opinions.  If we do want
this behavior, I think this should be written differently, something
like this: The PARALLEL N option to VACUUM takes precedence over this
option.

+ * parallel mode nor destories the parallel context. For updating the index

Spelling.

+/* DSM keys for parallel lazy vacuum */
+#define PARALLEL_VACUUM_KEY_SHARED UINT64CONST(0xFFFFFFFFFFF00001)
+#define PARALLEL_VACUUM_KEY_DEAD_TUPLES UINT64CONST(0xFFFFFFFFFFF00002)
+#define PARALLEL_VACUUM_KEY_QUERY_TEXT UINT64CONST(0xFFFFFFFFFFF00003)

Any special reason not to use just 1, 2, 3 here?  The general
infrastructure stuff uses high numbers to avoid conflicting with
plan_node_id values, but end clients of the parallel infrastructure
can generally just use small integers.

+ bool updated; /* is the stats updated? */

is -> are

+ * LVDeadTuples controls the dead tuple TIDs collected during heap scan.

what do you mean by "controls", exactly? stores?

+ * This is allocated in a dynamic shared memory segment when parallel
+ * lazy vacuum mode, or allocated in a local memory.

If this is in DSM, then max_tuples is a wart, I think.  We can't grow
the segment at that point.  I'm suspicious that we need a better
design here.  It looks like you gather all of the dead tuples in
backend-local memory and then allocate an equal amount of DSM to copy
them.  But that means that we are using twice as much memory, which
seems pretty bad.  You'd have to do that at least momentarily no
matter what, but it's not obvious that the backend-local copy is ever
freed.  There's another patch kicking around to allocate memory for
vacuum in chunks rather than preallocating the whole slab of memory at
once; we might want to think about getting that committed first and
then having this build on top of it.  At least we need something
smarter than this.

-heap_vacuum_rel(Relation onerel, int options, VacuumParams *params,
+heap_vacuum_rel(Relation onerel, VacuumOptions options, VacuumParams *params,

We generally avoid passing a struct by value; copying the struct can
be expensive and having multiple shallow copies of the same data
sometimes leads to surprising results.  I think it might be a good
idea to propose a preliminary refactoring patch that invents
VacuumOptions and gives it just a single 'int' member and refactors
everything to use it, and then that can be committed first.  It should
pass a pointer, though, not the actual struct.

+ LVState    *lvstate;

It's not clear to me why we need this new LVState thing.  What's the
motivation for that?  If it's a good idea, could it be done as a
separate, preparatory patch?  It seems to be responsible for a lot of
code churn in this patch.   It also leads to strange stuff like this:

  ereport(elevel,
- (errmsg("scanned index \"%s\" to remove %d row versions",
+ (errmsg("scanned index \"%s\" to remove %d row versions %s",
  RelationGetRelationName(indrel),
- vacrelstats->num_dead_tuples),
+ dead_tuples->num_tuples,
+ IsParallelWorker() ? "by parallel vacuum worker" : ""),

This doesn't seem to be great grammar, and translation guidelines
generally discourage this sort of incremental message construction
quite strongly.  Since the user can probably infer what happened by a
suitable choice of log_line_prefix, I'm not totally sure this is worth
doing in the first place, but if we're going to do it, it should
probably have two completely separate message strings and pick between
them using IsParallelWorker(), rather than building it up
incrementally like this.

+compute_parallel_workers(Relation rel, int nrequests, int nindexes)

I think 'nrequets' is meant to be 'nrequested'.  It isn't the number
of requests; it's the number of workers that were requested.

+ /* quick exit if no workers are prepared, e.g. under serializable isolation */

That comment makes very little sense in this context.

+ /* Report parallel vacuum worker information */
+ initStringInfo(&buf);
+ appendStringInfo(&buf,
+ ngettext("launched %d parallel vacuum worker %s (planned: %d",
+   "launched %d parallel vacuum workers %s (planned: %d",
+   lvstate->pcxt->nworkers_launched),
+ lvstate->pcxt->nworkers_launched,
+ for_cleanup ? "for index cleanup" : "for index vacuum",
+ lvstate->pcxt->nworkers);
+ if (lvstate->options.nworkers > 0)
+ appendStringInfo(&buf, ", requested %d", lvstate->options.nworkers);
+
+ appendStringInfo(&buf, ")");
+ ereport(elevel, (errmsg("%s", buf.data)));

This is another example of incremental message construction, again
violating translation guidelines.

+ WaitForParallelWorkersToAttach(lvstate->pcxt);

Why?

+ /*
+ * If there is already-updated result in the shared memory we use it.
+ * Otherwise we pass NULL to index AMs, meaning it's first time call,
+ * and copy the result to the shared memory segment.
+ */

I'm probably missing something here, but isn't the intention that we
only do each index once?  If so, how would there be anything there
already?  Once from for_cleanup = false and once for for_cleanup =
true?

+ if (a->options.flags != b->options.flags)
+ return false;
+ if (a->options.nworkers != b->options.nworkers)
+ return false;

You could just do COMPARE_SCALAR_FIELD(options.flags);
COMPARE_SCALAR_FIELD(options.nworkers);

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Thu, Feb 28, 2019 at 2:44 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Feb 14, 2019 at 5:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > Thank you. Attached the rebased patch.
>
> Here are some review comments.

Thank you for reviewing the patches!

>
> +         started by a single utility command.  Currently, the parallel
> +         utility commands that support the use of parallel workers are
> +         <command>CREATE INDEX</command> and <command>VACUUM</command>
> +         without <literal>FULL</literal> option, and only when building
> +         a B-tree index.  Parallel workers are taken from the pool of
>
> That sentence is garbled.  The end part about b-tree indexes applies
> only to CREATE INDEX, not to VACUUM, since VACUUM does build indexes.

Fixed.

>
> +      Vacuum index and cleanup index in parallel
> +      <replaceable class="parameter">N</replaceable> background
> workers (for the detail
> +      of each vacuum phases, please refer to <xref
> linkend="vacuum-phases"/>. If the
>
> I have two problems with this.  One is that I can't understand the
> English very well. I think you mean something like: "Perform the
> 'vacuum index' and 'cleanup index' phases of VACUUM in parallel using
> N background workers," but I'm not entirely sure.  The other is that
> if that is what you mean, I don't think it's a sufficient description.
> Users need to understand whether, for example, only one worker can be
> used per index, or whether the work for a single index can be split
> across workers.
>
> +      parallel degree <replaceable class="parameter">N</replaceable>
> is omitted,
> +      then <command>VACUUM</command> decides the number of workers based on
> +      number of indexes on the relation which further limited by
> +      <xref linkend="guc-max-parallel-workers-maintenance"/>. Also if
> this option
>
> Now this makes it sound like it's one worker per index, but you could
> be more explicit about it.

Fixed.

>
> +      is specified multile times, the last parallel degree
> +      <replaceable class="parameter">N</replaceable> is considered
> into the account.
>
> Typo, but I'd just delete this sentence altogether; the behavior if
> the option is multiply specified seems like a triviality that need not
> be documented.

Understood, removed.

>
> +    Setting a value for <literal>parallel_workers</literal> via
> +    <xref linkend="sql-altertable"/> also controls how many parallel
> +    worker processes will be requested by a <command>VACUUM</command>
> +    against the table. This setting is overwritten by setting
> +    <replaceable class="parameter">N</replaceable> of
> <literal>PARALLEL</literal>
> +    option.
>
> I wonder if we really want this behavior.  Should a setting that
> controls the degree of parallelism when scanning the table also affect
> VACUUM?  I tend to think that we probably don't ever want VACUUM of a
> table to be parallel by default, but rather something that the user
> must explicitly request.  Happy to hear other opinions.  If we do want
> this behavior, I think this should be written differently, something
> like this: The PARALLEL N option to VACUUM takes precedence over this
> option.

For example, I can imagine a use case where a batch job does parallel
vacuum to some tables in a maintenance window. The batch operation
will need to compute and specify the degree of parallelism every time
according to for instance the number of indexes, which would be
troublesome. But if we can set the degree of parallelism for each
tables it can just to do 'VACUUM (PARALLEL)'.

>
> + * parallel mode nor destories the parallel context. For updating the index
>
> Spelling.

Fixed.

>
> +/* DSM keys for parallel lazy vacuum */
> +#define PARALLEL_VACUUM_KEY_SHARED UINT64CONST(0xFFFFFFFFFFF00001)
> +#define PARALLEL_VACUUM_KEY_DEAD_TUPLES UINT64CONST(0xFFFFFFFFFFF00002)
> +#define PARALLEL_VACUUM_KEY_QUERY_TEXT UINT64CONST(0xFFFFFFFFFFF00003)
>
> Any special reason not to use just 1, 2, 3 here?  The general
> infrastructure stuff uses high numbers to avoid conflicting with
> plan_node_id values, but end clients of the parallel infrastructure
> can generally just use small integers.

It seems that I was worrying unnecessarily, changed to 1, 2, 3.

>
> + bool updated; /* is the stats updated? */
>
> is -> are
>
> + * LVDeadTuples controls the dead tuple TIDs collected during heap scan.
>
> what do you mean by "controls", exactly? stores?

Fixed.

>
> + * This is allocated in a dynamic shared memory segment when parallel
> + * lazy vacuum mode, or allocated in a local memory.
>
> If this is in DSM, then max_tuples is a wart, I think.  We can't grow
> the segment at that point.  I'm suspicious that we need a better
> design here.  It looks like you gather all of the dead tuples in
> backend-local memory and then allocate an equal amount of DSM to copy
> them.  But that means that we are using twice as much memory, which
> seems pretty bad.  You'd have to do that at least momentarily no
> matter what, but it's not obvious that the backend-local copy is ever
> freed.

Hmm, the current design is more simple; only the leader process scans
heap and save dead tuples TID to DSM. The DSM is allocated at once
when starting lazy vacuum and we never need to enlarge DSM . Also we
can use the same code around heap vacuum and collecting dead tuples
for both single process vacuum and parallel vacuum. Once index vacuum
is completed, the leader process reinitializes DSM and reuse it in the
next time.

> There's another patch kicking around to allocate memory for
> vacuum in chunks rather than preallocating the whole slab of memory at
> once; we might want to think about getting that committed first and
> then having this build on top of it.  At least we need something
> smarter than this.

Since the parallel vacuum uses memory in the same manner as the single
process vacuum it's not deteriorated. I'd agree that that patch is
more smarter and this patch can be built on top of it but I'm
concerned that there two proposals on that thread and the discussion
has not been active for 8 months. I wonder if  it would be worth to
think of improving the memory allocating based on that patch after the
parallel vacuum get committed.

>
> -heap_vacuum_rel(Relation onerel, int options, VacuumParams *params,
> +heap_vacuum_rel(Relation onerel, VacuumOptions options, VacuumParams *params,
>
> We generally avoid passing a struct by value; copying the struct can
> be expensive and having multiple shallow copies of the same data
> sometimes leads to surprising results.  I think it might be a good
> idea to propose a preliminary refactoring patch that invents
> VacuumOptions and gives it just a single 'int' member and refactors
> everything to use it, and then that can be committed first.  It should
> pass a pointer, though, not the actual struct.

Agreed. I'll separate patches and propose it.

>
> + LVState    *lvstate;
>
> It's not clear to me why we need this new LVState thing.  What's the
> motivation for that?  If it's a good idea, could it be done as a
> separate, preparatory patch?  It seems to be responsible for a lot of
> code churn in this patch.   It also leads to strange stuff like this:

The main motivations are refactoring and improving readability but
it's mainly for the previous version patch which implements parallel
heap vacuum. It might no longer need here. I'll try to implement
without LVState. Thank you.

>
>   ereport(elevel,
> - (errmsg("scanned index \"%s\" to remove %d row versions",
> + (errmsg("scanned index \"%s\" to remove %d row versions %s",
>   RelationGetRelationName(indrel),
> - vacrelstats->num_dead_tuples),
> + dead_tuples->num_tuples,
> + IsParallelWorker() ? "by parallel vacuum worker" : ""),
>
> This doesn't seem to be great grammar, and translation guidelines
> generally discourage this sort of incremental message construction
> quite strongly.  Since the user can probably infer what happened by a
> suitable choice of log_line_prefix, I'm not totally sure this is worth
> doing in the first place, but if we're going to do it, it should
> probably have two completely separate message strings and pick between
> them using IsParallelWorker(), rather than building it up
> incrementally like this.

Fixed.

>
> +compute_parallel_workers(Relation rel, int nrequests, int nindexes)
>
> I think 'nrequets' is meant to be 'nrequested'.  It isn't the number
> of requests; it's the number of workers that were requested.

Fixed.

>
> + /* quick exit if no workers are prepared, e.g. under serializable isolation */
>
> That comment makes very little sense in this context.

Fixed.

>
> + /* Report parallel vacuum worker information */
> + initStringInfo(&buf);
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker %s (planned: %d",
> +   "launched %d parallel vacuum workers %s (planned: %d",
> +   lvstate->pcxt->nworkers_launched),
> + lvstate->pcxt->nworkers_launched,
> + for_cleanup ? "for index cleanup" : "for index vacuum",
> + lvstate->pcxt->nworkers);
> + if (lvstate->options.nworkers > 0)
> + appendStringInfo(&buf, ", requested %d", lvstate->options.nworkers);
> +
> + appendStringInfo(&buf, ")");
> + ereport(elevel, (errmsg("%s", buf.data)));
>
> This is another example of incremental message construction, again
> violating translation guidelines.

Fixed.

>
> + WaitForParallelWorkersToAttach(lvstate->pcxt);
>
> Why?

Oh not necessary, removed.

>
> + /*
> + * If there is already-updated result in the shared memory we use it.
> + * Otherwise we pass NULL to index AMs, meaning it's first time call,
> + * and copy the result to the shared memory segment.
> + */
>
> I'm probably missing something here, but isn't the intention that we
> only do each index once?  If so, how would there be anything there
> already?  Once from for_cleanup = false and once for for_cleanup =
> true?

We call ambulkdelete (for_cleanup = false) 0 or more times for each
index and call amvacuumcleanup (for_cleanup = true) at the end. In the
first time calling either ambulkdelete or amvacuumcleanup the lazy
vacuum must pass NULL to them. They return either palloc'd
IndexBulkDeleteResult or NULL. If they returns the former the lazy
vacuum must pass it to them again at the next time. In current design,
since there is no guarantee that an index is always processed by the
same vacuum process each vacuum processes save the result to DSM in
order to share those results among vacuum processes. The 'updated'
flags indicates that its slot is used. So we can pass the address of
DSM if 'updated' is true, otherwise pass NULL.

>
> + if (a->options.flags != b->options.flags)
> + return false;
> + if (a->options.nworkers != b->options.nworkers)
> + return false;
>
> You could just do COMPARE_SCALAR_FIELD(options.flags);
> COMPARE_SCALAR_FIELD(options.nworkers);

Fixed.

Almost comments I got have been incorporated to the local branch but a
few comments need discussion. I'll submit the updated version patch
once I addressed all of comments.





Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

From
Robert Haas
Date:
On Fri, Mar 1, 2019 at 12:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > I wonder if we really want this behavior.  Should a setting that
> > controls the degree of parallelism when scanning the table also affect
> > VACUUM?  I tend to think that we probably don't ever want VACUUM of a
> > table to be parallel by default, but rather something that the user
> > must explicitly request.  Happy to hear other opinions.  If we do want
> > this behavior, I think this should be written differently, something
> > like this: The PARALLEL N option to VACUUM takes precedence over this
> > option.
>
> For example, I can imagine a use case where a batch job does parallel
> vacuum to some tables in a maintenance window. The batch operation
> will need to compute and specify the degree of parallelism every time
> according to for instance the number of indexes, which would be
> troublesome. But if we can set the degree of parallelism for each
> tables it can just to do 'VACUUM (PARALLEL)'.

True, but the setting in question would also affect the behavior of
sequential scans and index scans.  TBH, I'm not sure that the
parallel_workers reloption is really a great design as it is: is
hard-coding the number of workers really what people want?  Do they
really want the same degree of parallelism for sequential scans and
index scans?  Why should they want the same degree of parallelism also
for VACUUM?  Maybe they do, and maybe somebody explain why they do,
but as of now, it's not obvious to me why that should be true.

> Since the parallel vacuum uses memory in the same manner as the single
> process vacuum it's not deteriorated. I'd agree that that patch is
> more smarter and this patch can be built on top of it but I'm
> concerned that there two proposals on that thread and the discussion
> has not been active for 8 months. I wonder if  it would be worth to
> think of improving the memory allocating based on that patch after the
> parallel vacuum get committed.

Well, I think we can't just say "oh, this patch is going to use twice
as much memory as before," which is what it looks like it's doing
right now. If you think it's not doing that, can you explain further?

> Agreed. I'll separate patches and propose it.

Cool.  Probably best to keep that on this thread.

> The main motivations are refactoring and improving readability but
> it's mainly for the previous version patch which implements parallel
> heap vacuum. It might no longer need here. I'll try to implement
> without LVState. Thank you.

Oh, OK.

> > + /*
> > + * If there is already-updated result in the shared memory we use it.
> > + * Otherwise we pass NULL to index AMs, meaning it's first time call,
> > + * and copy the result to the shared memory segment.
> > + */
> >
> > I'm probably missing something here, but isn't the intention that we
> > only do each index once?  If so, how would there be anything there
> > already?  Once from for_cleanup = false and once for for_cleanup =
> > true?
>
> We call ambulkdelete (for_cleanup = false) 0 or more times for each
> index and call amvacuumcleanup (for_cleanup = true) at the end. In the
> first time calling either ambulkdelete or amvacuumcleanup the lazy
> vacuum must pass NULL to them. They return either palloc'd
> IndexBulkDeleteResult or NULL. If they returns the former the lazy
> vacuum must pass it to them again at the next time. In current design,
> since there is no guarantee that an index is always processed by the
> same vacuum process each vacuum processes save the result to DSM in
> order to share those results among vacuum processes. The 'updated'
> flags indicates that its slot is used. So we can pass the address of
> DSM if 'updated' is true, otherwise pass NULL.

Ah, OK.  Thanks for explaining.

> Almost comments I got have been incorporated to the local branch but a
> few comments need discussion. I'll submit the updated version patch
> once I addressed all of comments.

Cool.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Sat, Mar 2, 2019 at 3:54 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Mar 1, 2019 at 12:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > I wonder if we really want this behavior.  Should a setting that
> > > controls the degree of parallelism when scanning the table also affect
> > > VACUUM?  I tend to think that we probably don't ever want VACUUM of a
> > > table to be parallel by default, but rather something that the user
> > > must explicitly request.  Happy to hear other opinions.  If we do want
> > > this behavior, I think this should be written differently, something
> > > like this: The PARALLEL N option to VACUUM takes precedence over this
> > > option.
> >
> > For example, I can imagine a use case where a batch job does parallel
> > vacuum to some tables in a maintenance window. The batch operation
> > will need to compute and specify the degree of parallelism every time
> > according to for instance the number of indexes, which would be
> > troublesome. But if we can set the degree of parallelism for each
> > tables it can just to do 'VACUUM (PARALLEL)'.
>
> True, but the setting in question would also affect the behavior of
> sequential scans and index scans.  TBH, I'm not sure that the
> parallel_workers reloption is really a great design as it is: is
> hard-coding the number of workers really what people want?  Do they
> really want the same degree of parallelism for sequential scans and
> index scans?  Why should they want the same degree of parallelism also
> for VACUUM?  Maybe they do, and maybe somebody explain why they do,
> but as of now, it's not obvious to me why that should be true.

I think that there are users who want to specify the degree of
parallelism. I think that hard-coding the number of workers would be
good design for something like VACUUM which is a simple operation for
single object; since there are no joins, aggregations it'd be
relatively easy to compute it. That's why the patch introduces
PARALLEL N option as well. I think that a reloption for parallel
vacuum would be just a way to save the degree of parallelism. And I
agree that users don't want to use same degree of parallelism for
VACUUM, so maybe it'd better to add new reloption like
parallel_vacuum_workers. On the other hand, it can be a separate
patch, I can remove the reloption part from this patch and will
propose it when there are requests.

>
> > Since the parallel vacuum uses memory in the same manner as the single
> > process vacuum it's not deteriorated. I'd agree that that patch is
> > more smarter and this patch can be built on top of it but I'm
> > concerned that there two proposals on that thread and the discussion
> > has not been active for 8 months. I wonder if  it would be worth to
> > think of improving the memory allocating based on that patch after the
> > parallel vacuum get committed.
>
> Well, I think we can't just say "oh, this patch is going to use twice
> as much memory as before," which is what it looks like it's doing
> right now. If you think it's not doing that, can you explain further?

In the current design, the leader process allocates the whole DSM at
once when starting and records dead tuple's TIDs to the DSM. This is
the same behaviour as before except for it's recording dead tuples TID
to the shared memory segment. Once index vacuuming finished the leader
process re-initialize DSM for the next time. So parallel vacuum uses
the same amount of memory as before  during execution.

>
> > Agreed. I'll separate patches and propose it.
>
> Cool.  Probably best to keep that on this thread.

Understood.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Mon, Mar 4, 2019 at 10:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sat, Mar 2, 2019 at 3:54 AM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Fri, Mar 1, 2019 at 12:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > I wonder if we really want this behavior.  Should a setting that
> > > > controls the degree of parallelism when scanning the table also affect
> > > > VACUUM?  I tend to think that we probably don't ever want VACUUM of a
> > > > table to be parallel by default, but rather something that the user
> > > > must explicitly request.  Happy to hear other opinions.  If we do want
> > > > this behavior, I think this should be written differently, something
> > > > like this: The PARALLEL N option to VACUUM takes precedence over this
> > > > option.
> > >
> > > For example, I can imagine a use case where a batch job does parallel
> > > vacuum to some tables in a maintenance window. The batch operation
> > > will need to compute and specify the degree of parallelism every time
> > > according to for instance the number of indexes, which would be
> > > troublesome. But if we can set the degree of parallelism for each
> > > tables it can just to do 'VACUUM (PARALLEL)'.
> >
> > True, but the setting in question would also affect the behavior of
> > sequential scans and index scans.  TBH, I'm not sure that the
> > parallel_workers reloption is really a great design as it is: is
> > hard-coding the number of workers really what people want?  Do they
> > really want the same degree of parallelism for sequential scans and
> > index scans?  Why should they want the same degree of parallelism also
> > for VACUUM?  Maybe they do, and maybe somebody explain why they do,
> > but as of now, it's not obvious to me why that should be true.
>
> I think that there are users who want to specify the degree of
> parallelism. I think that hard-coding the number of workers would be
> good design for something like VACUUM which is a simple operation for
> single object; since there are no joins, aggregations it'd be
> relatively easy to compute it. That's why the patch introduces
> PARALLEL N option as well. I think that a reloption for parallel
> vacuum would be just a way to save the degree of parallelism. And I
> agree that users don't want to use same degree of parallelism for
> VACUUM, so maybe it'd better to add new reloption like
> parallel_vacuum_workers. On the other hand, it can be a separate
> patch, I can remove the reloption part from this patch and will
> propose it when there are requests.
>

Okay, attached the latest version of patch set. I've incorporated all
comments I got and separated the patch for making vacuum options a
Node (0001 patch). And the patch doesn't use parallel_workers. It
might be proposed in the another form again in the future if
requested.


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Robert Haas
Date:
On Wed, Mar 6, 2019 at 1:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Okay, attached the latest version of patch set. I've incorporated all
> comments I got and separated the patch for making vacuum options a
> Node (0001 patch). And the patch doesn't use parallel_workers. It
> might be proposed in the another form again in the future if
> requested.

Why make it a Node?  I mean I think a struct makes sense, but what's
the point of giving it a NodeTag?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Thu, Mar 7, 2019 at 2:54 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Mar 6, 2019 at 1:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > Okay, attached the latest version of patch set. I've incorporated all
> > comments I got and separated the patch for making vacuum options a
> > Node (0001 patch). And the patch doesn't use parallel_workers. It
> > might be proposed in the another form again in the future if
> > requested.
>
> Why make it a Node?  I mean I think a struct makes sense, but what's
> the point of giving it a NodeTag?
>

Well, the main point is consistency with other nodes and keep the code clean.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

From
Robert Haas
Date:
On Wed, Mar 6, 2019 at 10:58 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > Why make it a Node?  I mean I think a struct makes sense, but what's
> > the point of giving it a NodeTag?
>
> Well, the main point is consistency with other nodes and keep the code clean.

It looks to me like if we made it a plain struct rather than a node,
and embedded that struct (not a pointer) in VacuumStmt, then what
would happen is that _copyVacuumStmt and _equalVacuumStmt would have
clauses for each vacuum option individually, with a dot, like
COPY_SCALAR_FIELD(options.flags).

Also, the grammar production for VacuumStmt would need to be jiggered
around a bit; the way that options consolidation is done there would
have to be changed.

Neither of those things sound terribly hard or terribly messy, but on
the other hand I guess there's nothing really wrong with the way you
did it, either ... anybody else have an opinion?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Fri, Mar 8, 2019 at 12:22 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Mar 6, 2019 at 10:58 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > Why make it a Node?  I mean I think a struct makes sense, but what's
> > > the point of giving it a NodeTag?
> >
> > Well, the main point is consistency with other nodes and keep the code clean.
>
> It looks to me like if we made it a plain struct rather than a node,
> and embedded that struct (not a pointer) in VacuumStmt, then what
> would happen is that _copyVacuumStmt and _equalVacuumStmt would have
> clauses for each vacuum option individually, with a dot, like
> COPY_SCALAR_FIELD(options.flags).
>
> Also, the grammar production for VacuumStmt would need to be jiggered
> around a bit; the way that options consolidation is done there would
> have to be changed.
>
> Neither of those things sound terribly hard or terribly messy, but on
> the other hand I guess there's nothing really wrong with the way you
> did it, either ... anybody else have an opinion?
>

I don't have a strong opinion but the using a Node would be more
suitable in the future when we add more options to vacuum. And it
seems to me that it's unlikely to change a Node to a plain struct. So
there is an idea of doing it now anyway if we might need to do it
someday.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

From
Robert Haas
Date:
On Wed, Mar 13, 2019 at 1:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I don't have a strong opinion but the using a Node would be more
> suitable in the future when we add more options to vacuum. And it
> seems to me that it's unlikely to change a Node to a plain struct. So
> there is an idea of doing it now anyway if we might need to do it
> someday.

I just tried to apply 0001 again and noticed a conflict in the
autovac_table structure in postmaster.c.

That conflict got me thinking: aren't parameters and options an awful
lot alike?  Why do we need to pass around a VacuumOptions structure
*and* a VacuumParams structure to all of these functions?  Couldn't we
just have one?  That led to the attached patch, which just gets rid of
the separate options flag and folds it into VacuumParams.  If we took
this approach, the degree of parallelism would just be another thing
that would get added to VacuumParams, and VacuumOptions wouldn't end
up existing at all.

This patch does not address the question of what the *parse tree*
representation of the PARALLEL option should look like; the idea would
be that ExecVacuum() would need to extra the value for that option and
put it into VacuumParams just as it already does for various other
things in VacuumParams.  Maybe the most natural approach would be to
convert the grammar productions for the VACUUM options list so that
they just build a list of DefElems, and then have ExecVacuum() iterate
over that list and make sense of it, as for example ExplainQuery()
already does.

I kinda like the idea of doing it that way, but then I came up with
it, so maybe you or others will think it's terrible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Thu, Mar 14, 2019 at 6:41 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Mar 13, 2019 at 1:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > I don't have a strong opinion but the using a Node would be more
> > suitable in the future when we add more options to vacuum. And it
> > seems to me that it's unlikely to change a Node to a plain struct. So
> > there is an idea of doing it now anyway if we might need to do it
> > someday.
>
> I just tried to apply 0001 again and noticed a conflict in the
> autovac_table structure in postmaster.c.
>
> That conflict got me thinking: aren't parameters and options an awful
> lot alike?  Why do we need to pass around a VacuumOptions structure
> *and* a VacuumParams structure to all of these functions?  Couldn't we
> just have one?  That led to the attached patch, which just gets rid of
> the separate options flag and folds it into VacuumParams.

Indeed. I like this approach. The comment of vacuum() says,

* options is a bitmask of VacuumOption flags, indicating what to do.
* (snip)
* params contains a set of parameters that can be used to customize the
* behavior.

It seems to me that the purpose of both variables are different. But
it would be acceptable even if we merge them.

BTW your patch seems to not apply to the current HEAD cleanly and to
need to update the comment of vacuum().

> If we took
> this approach, the degree of parallelism would just be another thing
> that would get added to VacuumParams, and VacuumOptions wouldn't end
> up existing at all.
>

Agreed.

> This patch does not address the question of what the *parse tree*
> representation of the PARALLEL option should look like; the idea would
> be that ExecVacuum() would need to extra the value for that option and
> put it into VacuumParams just as it already does for various other
> things in VacuumParams.  Maybe the most natural approach would be to
> convert the grammar productions for the VACUUM options list so that
> they just build a list of DefElems, and then have ExecVacuum() iterate
> over that list and make sense of it, as for example ExplainQuery()
> already does.
>

Agreed. That change would help for the discussion changing VACUUM
option syntax to field-and-value style.

Attached the updated patch you proposed and the patch that converts
the grammer productions for the VACUUM option on top of the former
patch. The latter patch moves VacuumOption to vacuum.h since the
parser no longer needs such information.

If we take this direction I will change the parallel vacuum patch so
that it adds new PARALLEL option and adds 'nworkers' to VacuumParams.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, Feb 26, 2019 at 7:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Feb 26, 2019 at 1:35 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
> >
> > On Thu, Feb 14, 2019 at 9:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >>
> >> Thank you. Attached the rebased patch.
> >
> >
> > I ran some performance tests to compare the parallelism benefits,
>
> Thank you for testing!
>
> > but I got some strange results of performance overhead, may be it is
> > because, I tested it on my laptop.
>
> Hmm, I think the parallel vacuum would help for heavy workloads like a
> big table with multiple indexes. In your test result, all executions
> are completed within 1 sec, which seems to be one use case that the
> parallel vacuum wouldn't help. I suspect that the table is small,
> right? Anyway I'll also do performance tests.
>

Here is the performance test results. I've setup a 500MB table with
several indexes and made 10% of table dirty before each vacuum.
Compared execution time of the patched postgrse with the current HEAD
(at 'speed_up' column). In my environment,

 indexes | parallel_degree |  patched   |    head    | speed_up
---------+-----------------+------------+------------+----------
      0 |               0 |   238.2085 |   244.7625 |   1.0275
      0 |               1 |   237.7050 |   244.7625 |   1.0297
      0 |               2 |   238.0390 |   244.7625 |   1.0282
      0 |               4 |   238.1045 |   244.7625 |   1.0280
      0 |               8 |   237.8995 |   244.7625 |   1.0288
      0 |              16 |   237.7775 |   244.7625 |   1.0294
      1 |               0 |  1328.8590 |  1334.9125 |   1.0046
      1 |               1 |  1325.9140 |  1334.9125 |   1.0068
      1 |               2 |  1333.3665 |  1334.9125 |   1.0012
      1 |               4 |  1329.5205 |  1334.9125 |   1.0041
      1 |               8 |  1334.2255 |  1334.9125 |   1.0005
      1 |              16 |  1335.1510 |  1334.9125 |   0.9998
      2 |               0 |  2426.2905 |  2427.5165 |   1.0005
      2 |               1 |  1416.0595 |  2427.5165 |   1.7143
      2 |               2 |  1411.6270 |  2427.5165 |   1.7197
      2 |               4 |  1411.6490 |  2427.5165 |   1.7196
      2 |               8 |  1410.1750 |  2427.5165 |   1.7214
      2 |              16 |  1413.4985 |  2427.5165 |   1.7174
      4 |               0 |  4622.5060 |  4619.0340 |   0.9992
      4 |               1 |  2536.8435 |  4619.0340 |   1.8208
      4 |               2 |  2548.3615 |  4619.0340 |   1.8126
      4 |               4 |  1467.9655 |  4619.0340 |   3.1466
      4 |               8 |  1486.3155 |  4619.0340 |   3.1077
      4 |              16 |  1481.7150 |  4619.0340 |   3.1174
      8 |               0 |  9039.3810 |  8990.4735 |   0.9946
      8 |               1 |  4807.5880 |  8990.4735 |   1.8701
      8 |               2 |  3786.7620 |  8990.4735 |   2.3742
      8 |               4 |  2924.2205 |  8990.4735 |   3.0745
      8 |               8 |  2684.2545 |  8990.4735 |   3.3493
      8 |              16 |  2672.9800 |  8990.4735 |   3.3635
     16 |               0 | 17821.4715 | 17740.1300 |   0.9954
     16 |               1 |  9318.3810 | 17740.1300 |   1.9038
     16 |               2 |  7260.6315 | 17740.1300 |   2.4433
     16 |               4 |  5538.5225 | 17740.1300 |   3.2030
     16 |               8 |  5368.5255 | 17740.1300 |   3.3045
     16 |              16 |  5291.8510 | 17740.1300 |   3.3523
(36 rows)

Attached the updated version patches. The patches apply to the current
HEAD cleanly but the 0001 patch still changes the vacuum option to a
Node since it's under the discussion. After the direction has been
decided, I'll update the patches.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Kyotaro HORIGUCHI
Date:
Hello.

At Mon, 18 Mar 2019 11:54:42 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoC6bsM0FfePgzSV40uXofbFSPe-Ax095TOnu5GOZ790uA@mail.gmail.com>
> Here is the performance test results. I've setup a 500MB table with
> several indexes and made 10% of table dirty before each vacuum.
> Compared execution time of the patched postgrse with the current HEAD
> (at 'speed_up' column). In my environment,
> 
>  indexes | parallel_degree |  patched   |    head    | speed_up
> ---------+-----------------+------------+------------+----------
>       0 |               0 |   238.2085 |   244.7625 |   1.0275
>       0 |               1 |   237.7050 |   244.7625 |   1.0297
>       0 |               2 |   238.0390 |   244.7625 |   1.0282
>       0 |               4 |   238.1045 |   244.7625 |   1.0280
>       0 |               8 |   237.8995 |   244.7625 |   1.0288
>       0 |              16 |   237.7775 |   244.7625 |   1.0294
>       1 |               0 |  1328.8590 |  1334.9125 |   1.0046
>       1 |               1 |  1325.9140 |  1334.9125 |   1.0068
>       1 |               2 |  1333.3665 |  1334.9125 |   1.0012
>       1 |               4 |  1329.5205 |  1334.9125 |   1.0041
>       1 |               8 |  1334.2255 |  1334.9125 |   1.0005
>       1 |              16 |  1335.1510 |  1334.9125 |   0.9998
>       2 |               0 |  2426.2905 |  2427.5165 |   1.0005
>       2 |               1 |  1416.0595 |  2427.5165 |   1.7143
>       2 |               2 |  1411.6270 |  2427.5165 |   1.7197
>       2 |               4 |  1411.6490 |  2427.5165 |   1.7196
>       2 |               8 |  1410.1750 |  2427.5165 |   1.7214
>       2 |              16 |  1413.4985 |  2427.5165 |   1.7174
>       4 |               0 |  4622.5060 |  4619.0340 |   0.9992
>       4 |               1 |  2536.8435 |  4619.0340 |   1.8208
>       4 |               2 |  2548.3615 |  4619.0340 |   1.8126
>       4 |               4 |  1467.9655 |  4619.0340 |   3.1466
>       4 |               8 |  1486.3155 |  4619.0340 |   3.1077
>       4 |              16 |  1481.7150 |  4619.0340 |   3.1174
>       8 |               0 |  9039.3810 |  8990.4735 |   0.9946
>       8 |               1 |  4807.5880 |  8990.4735 |   1.8701
>       8 |               2 |  3786.7620 |  8990.4735 |   2.3742
>       8 |               4 |  2924.2205 |  8990.4735 |   3.0745
>       8 |               8 |  2684.2545 |  8990.4735 |   3.3493
>       8 |              16 |  2672.9800 |  8990.4735 |   3.3635
>      16 |               0 | 17821.4715 | 17740.1300 |   0.9954
>      16 |               1 |  9318.3810 | 17740.1300 |   1.9038
>      16 |               2 |  7260.6315 | 17740.1300 |   2.4433
>      16 |               4 |  5538.5225 | 17740.1300 |   3.2030
>      16 |               8 |  5368.5255 | 17740.1300 |   3.3045
>      16 |              16 |  5291.8510 | 17740.1300 |   3.3523
> (36 rows)

For indexes=4,8,16, the cases with parallel_degree=4,8,16 behave
almost the same. I suspect that the indexes are too-small and all
the index pages were on memory and CPU is saturated. Maybe you
had four cores and parallel workers more than the number had no
effect.  Other normal backends should have been able do almost
nothing meanwhile. Usually the number of parallel workers is
determined so that IO capacity is filled up but this feature
intermittently saturates CPU capacity very under such a
situation.

I'm not sure, but what if we do index vacuum in one-tuple-by-one
manner? That is, heap vacuum passes dead tuple one-by-one (or
buffering few tuples) to workers and workers process it not by
bulkdelete, but just tuple_delete (we don't have one). That could
avoid the sleep time of heap-scan while index bulkdelete.


> Attached the updated version patches. The patches apply to the current
> HEAD cleanly but the 0001 patch still changes the vacuum option to a
> Node since it's under the discussion. After the direction has been
> decided, I'll update the patches.

As for the to-be-or-not-to-be a node problem, I don't think it is
needed but from the point of consistency, it seems reasonable and
it is seen in other nodes that *Stmt Node holds option Node. But
makeVacOpt and it's usage, and subsequent operations on the node
look somewhat strange.. Why don't you just do
"makeNode(VacuumOptions)"?


>+    /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
>+    maxtuples = compute_max_dead_tuples(nblocks, nindexes > 0);

If I understand this correctly, nindexes is always > 1 there. At
lesat asserted that > 0 there.

>+    estdt = MAXALIGN(add_size(sizeof(LVDeadTuples),

I don't think the name is good. (dt menant detach by the first look for me..)

>+        if (lps->nworkers_requested > 0)
>+            appendStringInfo(&buf,
>+                             ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d, requested
%d)",

"planned"?


>+        /* Get the next index to vacuum */
>+        if (do_parallel)
>+            idx = pg_atomic_fetch_add_u32(&(lps->lvshared->nprocessed), 1);
>+        else
>+            idx = nprocessed++;

It seems that both of the two cases can be handled using
LVParallelState and most of the branches by lps or do_parallel
can be removed.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [HACKERS] Block level parallel vacuum

From
Robert Haas
Date:
On Thu, Mar 14, 2019 at 3:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> BTW your patch seems to not apply to the current HEAD cleanly and to
> need to update the comment of vacuum().

Yeah, I omitted some hunks by being stupid with 'git'.

Since you seem to like the approach, I put back the hunks I intended
to have there, pulled in one change from your v2 that looked good,
made one other tweak, and committed this.  I think I like what I did
with vacuum_open_relation a bit better than what you did; actually, I
think it cannot be right to just pass 'params' when the current code
is passing params->options & ~(VACOPT_VACUUM).  My approach avoids
that particular pitfall.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Block level parallel vacuum

From
Robert Haas
Date:
On Thu, Mar 14, 2019 at 3:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Attached the updated patch you proposed and the patch that converts
> the grammer productions for the VACUUM option on top of the former
> patch. The latter patch moves VacuumOption to vacuum.h since the
> parser no longer needs such information.

Committed.

> If we take this direction I will change the parallel vacuum patch so
> that it adds new PARALLEL option and adds 'nworkers' to VacuumParams.

Sounds good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, Mar 19, 2019 at 3:05 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Mar 14, 2019 at 3:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > BTW your patch seems to not apply to the current HEAD cleanly and to
> > need to update the comment of vacuum().
>
> Yeah, I omitted some hunks by being stupid with 'git'.
>
> Since you seem to like the approach, I put back the hunks I intended
> to have there, pulled in one change from your v2 that looked good,
> made one other tweak, and committed this.

Thank you!

>   I think I like what I did
> with vacuum_open_relation a bit better than what you did; actually, I
> think it cannot be right to just pass 'params' when the current code
> is passing params->options & ~(VACOPT_VACUUM).  My approach avoids
> that particular pitfall.

Agreed. Thanks.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

From
Haribabu Kommi
Date:

On Mon, Mar 18, 2019 at 1:58 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, Feb 26, 2019 at 7:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Feb 26, 2019 at 1:35 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
> >
> > On Thu, Feb 14, 2019 at 9:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >>
> >> Thank you. Attached the rebased patch.
> >
> >
> > I ran some performance tests to compare the parallelism benefits,
>
> Thank you for testing!
>
> > but I got some strange results of performance overhead, may be it is
> > because, I tested it on my laptop.
>
> Hmm, I think the parallel vacuum would help for heavy workloads like a
> big table with multiple indexes. In your test result, all executions
> are completed within 1 sec, which seems to be one use case that the
> parallel vacuum wouldn't help. I suspect that the table is small,
> right? Anyway I'll also do performance tests.
>

Here is the performance test results. I've setup a 500MB table with
several indexes and made 10% of table dirty before each vacuum.
Compared execution time of the patched postgrse with the current HEAD
(at 'speed_up' column). In my environment,

 indexes | parallel_degree |  patched   |    head    | speed_up
---------+-----------------+------------+------------+----------
      0 |               0 |   238.2085 |   244.7625 |   1.0275
      0 |               1 |   237.7050 |   244.7625 |   1.0297
      0 |               2 |   238.0390 |   244.7625 |   1.0282
      0 |               4 |   238.1045 |   244.7625 |   1.0280
      0 |               8 |   237.8995 |   244.7625 |   1.0288
      0 |              16 |   237.7775 |   244.7625 |   1.0294
      1 |               0 |  1328.8590 |  1334.9125 |   1.0046
      1 |               1 |  1325.9140 |  1334.9125 |   1.0068
      1 |               2 |  1333.3665 |  1334.9125 |   1.0012
      1 |               4 |  1329.5205 |  1334.9125 |   1.0041
      1 |               8 |  1334.2255 |  1334.9125 |   1.0005
      1 |              16 |  1335.1510 |  1334.9125 |   0.9998
      2 |               0 |  2426.2905 |  2427.5165 |   1.0005
      2 |               1 |  1416.0595 |  2427.5165 |   1.7143
      2 |               2 |  1411.6270 |  2427.5165 |   1.7197
      2 |               4 |  1411.6490 |  2427.5165 |   1.7196
      2 |               8 |  1410.1750 |  2427.5165 |   1.7214
      2 |              16 |  1413.4985 |  2427.5165 |   1.7174
      4 |               0 |  4622.5060 |  4619.0340 |   0.9992
      4 |               1 |  2536.8435 |  4619.0340 |   1.8208
      4 |               2 |  2548.3615 |  4619.0340 |   1.8126
      4 |               4 |  1467.9655 |  4619.0340 |   3.1466
      4 |               8 |  1486.3155 |  4619.0340 |   3.1077
      4 |              16 |  1481.7150 |  4619.0340 |   3.1174
      8 |               0 |  9039.3810 |  8990.4735 |   0.9946
      8 |               1 |  4807.5880 |  8990.4735 |   1.8701
      8 |               2 |  3786.7620 |  8990.4735 |   2.3742
      8 |               4 |  2924.2205 |  8990.4735 |   3.0745
      8 |               8 |  2684.2545 |  8990.4735 |   3.3493
      8 |              16 |  2672.9800 |  8990.4735 |   3.3635
     16 |               0 | 17821.4715 | 17740.1300 |   0.9954
     16 |               1 |  9318.3810 | 17740.1300 |   1.9038
     16 |               2 |  7260.6315 | 17740.1300 |   2.4433
     16 |               4 |  5538.5225 | 17740.1300 |   3.2030
     16 |               8 |  5368.5255 | 17740.1300 |   3.3045
     16 |              16 |  5291.8510 | 17740.1300 |   3.3523
(36 rows)

The performance results are good. Do we want to add the recommended
size in the document for the parallel option? the parallel option for smaller
tables can lead to performance overhead.

Regards,
Haribabu Kommi
Fujitsu Australia

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Mon, Mar 18, 2019 at 7:06 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>
> Hello.
>
> At Mon, 18 Mar 2019 11:54:42 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoC6bsM0FfePgzSV40uXofbFSPe-Ax095TOnu5GOZ790uA@mail.gmail.com>
> > Here is the performance test results. I've setup a 500MB table with
> > several indexes and made 10% of table dirty before each vacuum.
> > Compared execution time of the patched postgrse with the current HEAD
> > (at 'speed_up' column). In my environment,
> >
> >  indexes | parallel_degree |  patched   |    head    | speed_up
> > ---------+-----------------+------------+------------+----------
> >       0 |               0 |   238.2085 |   244.7625 |   1.0275
> >       0 |               1 |   237.7050 |   244.7625 |   1.0297
> >       0 |               2 |   238.0390 |   244.7625 |   1.0282
> >       0 |               4 |   238.1045 |   244.7625 |   1.0280
> >       0 |               8 |   237.8995 |   244.7625 |   1.0288
> >       0 |              16 |   237.7775 |   244.7625 |   1.0294
> >       1 |               0 |  1328.8590 |  1334.9125 |   1.0046
> >       1 |               1 |  1325.9140 |  1334.9125 |   1.0068
> >       1 |               2 |  1333.3665 |  1334.9125 |   1.0012
> >       1 |               4 |  1329.5205 |  1334.9125 |   1.0041
> >       1 |               8 |  1334.2255 |  1334.9125 |   1.0005
> >       1 |              16 |  1335.1510 |  1334.9125 |   0.9998
> >       2 |               0 |  2426.2905 |  2427.5165 |   1.0005
> >       2 |               1 |  1416.0595 |  2427.5165 |   1.7143
> >       2 |               2 |  1411.6270 |  2427.5165 |   1.7197
> >       2 |               4 |  1411.6490 |  2427.5165 |   1.7196
> >       2 |               8 |  1410.1750 |  2427.5165 |   1.7214
> >       2 |              16 |  1413.4985 |  2427.5165 |   1.7174
> >       4 |               0 |  4622.5060 |  4619.0340 |   0.9992
> >       4 |               1 |  2536.8435 |  4619.0340 |   1.8208
> >       4 |               2 |  2548.3615 |  4619.0340 |   1.8126
> >       4 |               4 |  1467.9655 |  4619.0340 |   3.1466
> >       4 |               8 |  1486.3155 |  4619.0340 |   3.1077
> >       4 |              16 |  1481.7150 |  4619.0340 |   3.1174
> >       8 |               0 |  9039.3810 |  8990.4735 |   0.9946
> >       8 |               1 |  4807.5880 |  8990.4735 |   1.8701
> >       8 |               2 |  3786.7620 |  8990.4735 |   2.3742
> >       8 |               4 |  2924.2205 |  8990.4735 |   3.0745
> >       8 |               8 |  2684.2545 |  8990.4735 |   3.3493
> >       8 |              16 |  2672.9800 |  8990.4735 |   3.3635
> >      16 |               0 | 17821.4715 | 17740.1300 |   0.9954
> >      16 |               1 |  9318.3810 | 17740.1300 |   1.9038
> >      16 |               2 |  7260.6315 | 17740.1300 |   2.4433
> >      16 |               4 |  5538.5225 | 17740.1300 |   3.2030
> >      16 |               8 |  5368.5255 | 17740.1300 |   3.3045
> >      16 |              16 |  5291.8510 | 17740.1300 |   3.3523
> > (36 rows)
>
> For indexes=4,8,16, the cases with parallel_degree=4,8,16 behave
> almost the same. I suspect that the indexes are too-small and all
> the index pages were on memory and CPU is saturated. Maybe you
> had four cores and parallel workers more than the number had no
> effect.  Other normal backends should have been able do almost
> nothing meanwhile. Usually the number of parallel workers is
> determined so that IO capacity is filled up but this feature
> intermittently saturates CPU capacity very under such a
> situation.
>

I'm sorry I didn't make it clear enough. If the parallel degree is
higher than 'the number of indexes - 1' redundant workers are not
launched. So for indexes=4, 8, 16 the number of actually launched
parallel workers is up to 3, 7, 15 respectively. That's why the result
shows almost the same execution time in the cases where nindexes <=
parallel_degree.

I'll share the performance test result of more larger tables and indexes.

> I'm not sure, but what if we do index vacuum in one-tuple-by-one
> manner? That is, heap vacuum passes dead tuple one-by-one (or
> buffering few tuples) to workers and workers process it not by
> bulkdelete, but just tuple_delete (we don't have one). That could
> avoid the sleep time of heap-scan while index bulkdelete.
>

Just to be clear, in parallel lazy vacuum all parallel vacuum
processes including the leader process do index vacuuming, no one
doesn't sleep during index vacuuming. The leader process does heap
scan and launches parallel workers before index vacuuming. Each
processes exclusively processes indexes one by one.

Such index deletion method could be an optimization but I'm not sure
that the calling tuple_delete many times would be faster than one
bulkdelete. If there are many dead tuples vacuum has to call
tuple_delete as much as dead tuples. In general one seqscan is faster
than tons of indexscan. There is the proposal for such one by one
index deletions[1] but it's not a replacement of bulkdelete.

>
> > Attached the updated version patches. The patches apply to the current
> > HEAD cleanly but the 0001 patch still changes the vacuum option to a
> > Node since it's under the discussion. After the direction has been
> > decided, I'll update the patches.
>
> As for the to-be-or-not-to-be a node problem, I don't think it is
> needed but from the point of consistency, it seems reasonable and
> it is seen in other nodes that *Stmt Node holds option Node. But
> makeVacOpt and it's usage, and subsequent operations on the node
> look somewhat strange.. Why don't you just do
> "makeNode(VacuumOptions)"?

Thank you for the comment but this part has gone away as the recent
commit changed the grammar production of vacuum command.

>
>
> >+      /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
> >+    maxtuples = compute_max_dead_tuples(nblocks, nindexes > 0);
>
> If I understand this correctly, nindexes is always > 1 there. At
> lesat asserted that > 0 there.
>
> >+      estdt = MAXALIGN(add_size(sizeof(LVDeadTuples),
>
> I don't think the name is good. (dt menant detach by the first look for me..)

Fixed.

>
> >+        if (lps->nworkers_requested > 0)
> >+            appendStringInfo(&buf,
> >+                             ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d, requested
%d)",
>
> "planned"?

The 'planned' shows how many parallel workers we planned to launch.
The degree of parallelism is determined based on either user request
or the number of indexes that the table has.

>
>
> >+        /* Get the next index to vacuum */
> >+        if (do_parallel)
> >+            idx = pg_atomic_fetch_add_u32(&(lps->lvshared->nprocessed), 1);
> >+        else
> >+            idx = nprocessed++;
>
> It seems that both of the two cases can be handled using
> LVParallelState and most of the branches by lps or do_parallel
> can be removed.
>

Sorry I couldn't get your comment. You meant to move nprocessed to
LVParallelState?

[1] https://www.postgresql.org/message-id/flat/425db134-8bba-005c-b59d-56e50de3b41e%40postgrespro.ru

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

From
Kyotaro HORIGUCHI
Date:
At Tue, 19 Mar 2019 13:31:04 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoD4ivrYqg5tau460zEEcgR0t9cV-UagjJ997OfvP3gsNQ@mail.gmail.com>
> > For indexes=4,8,16, the cases with parallel_degree=4,8,16 behave
> > almost the same. I suspect that the indexes are too-small and all
> > the index pages were on memory and CPU is saturated. Maybe you
> > had four cores and parallel workers more than the number had no
> > effect.  Other normal backends should have been able do almost
> > nothing meanwhile. Usually the number of parallel workers is
> > determined so that IO capacity is filled up but this feature
> > intermittently saturates CPU capacity very under such a
> > situation.
> >
> 
> I'm sorry I didn't make it clear enough. If the parallel degree is
> higher than 'the number of indexes - 1' redundant workers are not
> launched. So for indexes=4, 8, 16 the number of actually launched
> parallel workers is up to 3, 7, 15 respectively. That's why the result
> shows almost the same execution time in the cases where nindexes <=
> parallel_degree.

In the 16 indexes case, the performance saturated at 4 workers
which contradicts to your explanation.

> I'll share the performance test result of more larger tables and indexes.
> 
> > I'm not sure, but what if we do index vacuum in one-tuple-by-one
> > manner? That is, heap vacuum passes dead tuple one-by-one (or
> > buffering few tuples) to workers and workers process it not by
> > bulkdelete, but just tuple_delete (we don't have one). That could
> > avoid the sleep time of heap-scan while index bulkdelete.
> >
> 
> Just to be clear, in parallel lazy vacuum all parallel vacuum
> processes including the leader process do index vacuuming, no one
> doesn't sleep during index vacuuming. The leader process does heap
> scan and launches parallel workers before index vacuuming. Each
> processes exclusively processes indexes one by one.

The leader doesn't continue heap-scan while index vacuuming is
running. And the index-page-scan seems eat up CPU easily. If
index vacuum can run simultaneously with the next heap scan
phase, we can make index scan finishes almost the same time with
the next round of heap scan. It would reduce the (possible) CPU
contention. But this requires as the twice size of shared
memoryas the current implement.

> Such index deletion method could be an optimization but I'm not sure
> that the calling tuple_delete many times would be faster than one
> bulkdelete. If there are many dead tuples vacuum has to call
> tuple_delete as much as dead tuples. In general one seqscan is faster
> than tons of indexscan. There is the proposal for such one by one
> index deletions[1] but it's not a replacement of bulkdelete.

I'm not sure what you mean by 'replacement' but it depends on how
large part of a table is removed at once. As mentioned in the
thread. But unfortunately it doesn't seem easy to do..

> > > Attached the updated version patches. The patches apply to the current
> > > HEAD cleanly but the 0001 patch still changes the vacuum option to a
> > > Node since it's under the discussion. After the direction has been
> > > decided, I'll update the patches.
> >
> > As for the to-be-or-not-to-be a node problem, I don't think it is
> > needed but from the point of consistency, it seems reasonable and
> > it is seen in other nodes that *Stmt Node holds option Node. But
> > makeVacOpt and it's usage, and subsequent operations on the node
> > look somewhat strange.. Why don't you just do
> > "makeNode(VacuumOptions)"?
> 
> Thank you for the comment but this part has gone away as the recent
> commit changed the grammar production of vacuum command.

Oops!


> > >+      /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
> > >+    maxtuples = compute_max_dead_tuples(nblocks, nindexes > 0);
> >
> > If I understand this correctly, nindexes is always > 1 there. At
> > lesat asserted that > 0 there.
> >
> > >+      estdt = MAXALIGN(add_size(sizeof(LVDeadTuples),
> >
> > I don't think the name is good. (dt menant detach by the first look for me..)
> 
> Fixed.
> 
> >
> > >+        if (lps->nworkers_requested > 0)
> > >+            appendStringInfo(&buf,
> > >+                             ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d,
requested%d)",
 
> >
> > "planned"?
> 
> The 'planned' shows how many parallel workers we planned to launch.
> The degree of parallelism is determined based on either user request
> or the number of indexes that the table has.
> 
> >
> >
> > >+        /* Get the next index to vacuum */
> > >+        if (do_parallel)
> > >+            idx = pg_atomic_fetch_add_u32(&(lps->lvshared->nprocessed), 1);
> > >+        else
> > >+            idx = nprocessed++;
> >
> > It seems that both of the two cases can be handled using
> > LVParallelState and most of the branches by lps or do_parallel
> > can be removed.
> >
> 
> Sorry I couldn't get your comment. You meant to move nprocessed to
> LVParallelState?

Exactly. I meant letting lvshared points to private memory, but
it might introduce confusion.


> [1] https://www.postgresql.org/message-id/flat/425db134-8bba-005c-b59d-56e50de3b41e%40postgrespro.ru

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, Mar 19, 2019 at 10:39 AM Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
>
>
> On Mon, Mar 18, 2019 at 1:58 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>> On Tue, Feb 26, 2019 at 7:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> >
>> > On Tue, Feb 26, 2019 at 1:35 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>> > >
>> > > On Thu, Feb 14, 2019 at 9:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> > >>
>> > >> Thank you. Attached the rebased patch.
>> > >
>> > >
>> > > I ran some performance tests to compare the parallelism benefits,
>> >
>> > Thank you for testing!
>> >
>> > > but I got some strange results of performance overhead, may be it is
>> > > because, I tested it on my laptop.
>> >
>> > Hmm, I think the parallel vacuum would help for heavy workloads like a
>> > big table with multiple indexes. In your test result, all executions
>> > are completed within 1 sec, which seems to be one use case that the
>> > parallel vacuum wouldn't help. I suspect that the table is small,
>> > right? Anyway I'll also do performance tests.
>> >
>>
>> Here is the performance test results. I've setup a 500MB table with
>> several indexes and made 10% of table dirty before each vacuum.
>> Compared execution time of the patched postgrse with the current HEAD
>> (at 'speed_up' column). In my environment,
>>
>>  indexes | parallel_degree |  patched   |    head    | speed_up
>> ---------+-----------------+------------+------------+----------
>>       0 |               0 |   238.2085 |   244.7625 |   1.0275
>>       0 |               1 |   237.7050 |   244.7625 |   1.0297
>>       0 |               2 |   238.0390 |   244.7625 |   1.0282
>>       0 |               4 |   238.1045 |   244.7625 |   1.0280
>>       0 |               8 |   237.8995 |   244.7625 |   1.0288
>>       0 |              16 |   237.7775 |   244.7625 |   1.0294
>>       1 |               0 |  1328.8590 |  1334.9125 |   1.0046
>>       1 |               1 |  1325.9140 |  1334.9125 |   1.0068
>>       1 |               2 |  1333.3665 |  1334.9125 |   1.0012
>>       1 |               4 |  1329.5205 |  1334.9125 |   1.0041
>>       1 |               8 |  1334.2255 |  1334.9125 |   1.0005
>>       1 |              16 |  1335.1510 |  1334.9125 |   0.9998
>>       2 |               0 |  2426.2905 |  2427.5165 |   1.0005
>>       2 |               1 |  1416.0595 |  2427.5165 |   1.7143
>>       2 |               2 |  1411.6270 |  2427.5165 |   1.7197
>>       2 |               4 |  1411.6490 |  2427.5165 |   1.7196
>>       2 |               8 |  1410.1750 |  2427.5165 |   1.7214
>>       2 |              16 |  1413.4985 |  2427.5165 |   1.7174
>>       4 |               0 |  4622.5060 |  4619.0340 |   0.9992
>>       4 |               1 |  2536.8435 |  4619.0340 |   1.8208
>>       4 |               2 |  2548.3615 |  4619.0340 |   1.8126
>>       4 |               4 |  1467.9655 |  4619.0340 |   3.1466
>>       4 |               8 |  1486.3155 |  4619.0340 |   3.1077
>>       4 |              16 |  1481.7150 |  4619.0340 |   3.1174
>>       8 |               0 |  9039.3810 |  8990.4735 |   0.9946
>>       8 |               1 |  4807.5880 |  8990.4735 |   1.8701
>>       8 |               2 |  3786.7620 |  8990.4735 |   2.3742
>>       8 |               4 |  2924.2205 |  8990.4735 |   3.0745
>>       8 |               8 |  2684.2545 |  8990.4735 |   3.3493
>>       8 |              16 |  2672.9800 |  8990.4735 |   3.3635
>>      16 |               0 | 17821.4715 | 17740.1300 |   0.9954
>>      16 |               1 |  9318.3810 | 17740.1300 |   1.9038
>>      16 |               2 |  7260.6315 | 17740.1300 |   2.4433
>>      16 |               4 |  5538.5225 | 17740.1300 |   3.2030
>>      16 |               8 |  5368.5255 | 17740.1300 |   3.3045
>>      16 |              16 |  5291.8510 | 17740.1300 |   3.3523
>> (36 rows)
>
>
> The performance results are good. Do we want to add the recommended
> size in the document for the parallel option? the parallel option for smaller
> tables can lead to performance overhead.
>

Hmm, I don't think we can add the specific recommended size because
the performance gain by parallel lazy vacuum depends on various things
such as CPU cores, the number of indexes, shared buffer size, index
types, HDD or SSD. I suppose that users who want to use this option
have some sort of performance problem such as that vacuum takes a very
long time. They would use it for relatively larger tables.


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, Mar 19, 2019 at 4:59 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>
> At Tue, 19 Mar 2019 13:31:04 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoD4ivrYqg5tau460zEEcgR0t9cV-UagjJ997OfvP3gsNQ@mail.gmail.com>
> > > For indexes=4,8,16, the cases with parallel_degree=4,8,16 behave
> > > almost the same. I suspect that the indexes are too-small and all
> > > the index pages were on memory and CPU is saturated. Maybe you
> > > had four cores and parallel workers more than the number had no
> > > effect.  Other normal backends should have been able do almost
> > > nothing meanwhile. Usually the number of parallel workers is
> > > determined so that IO capacity is filled up but this feature
> > > intermittently saturates CPU capacity very under such a
> > > situation.
> > >
> >
> > I'm sorry I didn't make it clear enough. If the parallel degree is
> > higher than 'the number of indexes - 1' redundant workers are not
> > launched. So for indexes=4, 8, 16 the number of actually launched
> > parallel workers is up to 3, 7, 15 respectively. That's why the result
> > shows almost the same execution time in the cases where nindexes <=
> > parallel_degree.
>
> In the 16 indexes case, the performance saturated at 4 workers
> which contradicts to your explanation.

Because the machine I used has 4 cores the performance doesn't get
improved even if more than 4 parallel workers are launched.

>
> > I'll share the performance test result of more larger tables and indexes.
> >
> > > I'm not sure, but what if we do index vacuum in one-tuple-by-one
> > > manner? That is, heap vacuum passes dead tuple one-by-one (or
> > > buffering few tuples) to workers and workers process it not by
> > > bulkdelete, but just tuple_delete (we don't have one). That could
> > > avoid the sleep time of heap-scan while index bulkdelete.
> > >
> >
> > Just to be clear, in parallel lazy vacuum all parallel vacuum
> > processes including the leader process do index vacuuming, no one
> > doesn't sleep during index vacuuming. The leader process does heap
> > scan and launches parallel workers before index vacuuming. Each
> > processes exclusively processes indexes one by one.
>
> The leader doesn't continue heap-scan while index vacuuming is
> running. And the index-page-scan seems eat up CPU easily. If
> index vacuum can run simultaneously with the next heap scan
> phase, we can make index scan finishes almost the same time with
> the next round of heap scan. It would reduce the (possible) CPU
> contention. But this requires as the twice size of shared
> memoryas the current implement.

Yeah, I've considered that something like pipe-lining approach that
one process continue to queue the dead tuples and other process
fetches and processes them during index vacuuming but the current
version patch employed the most simple approach as the first step.
Once we had the retail index deletion approach we might be able to use
it for parallel vacuum.

>
> > Such index deletion method could be an optimization but I'm not sure
> > that the calling tuple_delete many times would be faster than one
> > bulkdelete. If there are many dead tuples vacuum has to call
> > tuple_delete as much as dead tuples. In general one seqscan is faster
> > than tons of indexscan. There is the proposal for such one by one
> > index deletions[1] but it's not a replacement of bulkdelete.
>
> I'm not sure what you mean by 'replacement' but it depends on how
> large part of a table is removed at once. As mentioned in the
> thread. But unfortunately it doesn't seem easy to do..
>
> > > > Attached the updated version patches. The patches apply to the current
> > > > HEAD cleanly but the 0001 patch still changes the vacuum option to a
> > > > Node since it's under the discussion. After the direction has been
> > > > decided, I'll update the patches.
> > >
> > > As for the to-be-or-not-to-be a node problem, I don't think it is
> > > needed but from the point of consistency, it seems reasonable and
> > > it is seen in other nodes that *Stmt Node holds option Node. But
> > > makeVacOpt and it's usage, and subsequent operations on the node
> > > look somewhat strange.. Why don't you just do
> > > "makeNode(VacuumOptions)"?
> >
> > Thank you for the comment but this part has gone away as the recent
> > commit changed the grammar production of vacuum command.
>
> Oops!
>
>
> > > >+      /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
> > > >+    maxtuples = compute_max_dead_tuples(nblocks, nindexes > 0);
> > >
> > > If I understand this correctly, nindexes is always > 1 there. At
> > > lesat asserted that > 0 there.
> > >
> > > >+      estdt = MAXALIGN(add_size(sizeof(LVDeadTuples),
> > >
> > > I don't think the name is good. (dt menant detach by the first look for me..)
> >
> > Fixed.
> >
> > >
> > > >+        if (lps->nworkers_requested > 0)
> > > >+            appendStringInfo(&buf,
> > > >+                             ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d,
requested%d)",
 
> > >
> > > "planned"?
> >
> > The 'planned' shows how many parallel workers we planned to launch.
> > The degree of parallelism is determined based on either user request
> > or the number of indexes that the table has.
> >
> > >
> > >
> > > >+        /* Get the next index to vacuum */
> > > >+        if (do_parallel)
> > > >+            idx = pg_atomic_fetch_add_u32(&(lps->lvshared->nprocessed), 1);
> > > >+        else
> > > >+            idx = nprocessed++;
> > >
> > > It seems that both of the two cases can be handled using
> > > LVParallelState and most of the branches by lps or do_parallel
> > > can be removed.
> > >
> >
> > Sorry I couldn't get your comment. You meant to move nprocessed to
> > LVParallelState?
>
> Exactly. I meant letting lvshared points to private memory, but
> it might introduce confusion.

Hmm, I'm not sure it would be a good idea. It would introduce
confusion as you mentioned. And since 'nprocessed' have to be
pg_atomic_uint32 in parallel mode we will end up with having an
another branch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

From
Kyotaro HORIGUCHI
Date:
At Tue, 19 Mar 2019 19:01:06 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoA3PpkcNNzcQmiNgFL3DudhdLRWoTvQE6=kRagFLjUiBg@mail.gmail.com>
> On Tue, Mar 19, 2019 at 4:59 PM Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> >
> > At Tue, 19 Mar 2019 13:31:04 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoD4ivrYqg5tau460zEEcgR0t9cV-UagjJ997OfvP3gsNQ@mail.gmail.com>
> > > > For indexes=4,8,16, the cases with parallel_degree=4,8,16 behave
> > > > almost the same. I suspect that the indexes are too-small and all
> > > > the index pages were on memory and CPU is saturated. Maybe you
> > > > had four cores and parallel workers more than the number had no
> > > > effect.  Other normal backends should have been able do almost
> > > > nothing meanwhile. Usually the number of parallel workers is
> > > > determined so that IO capacity is filled up but this feature
> > > > intermittently saturates CPU capacity very under such a
> > > > situation.
> > > >
> > >
> > > I'm sorry I didn't make it clear enough. If the parallel degree is
> > > higher than 'the number of indexes - 1' redundant workers are not
> > > launched. So for indexes=4, 8, 16 the number of actually launched
> > > parallel workers is up to 3, 7, 15 respectively. That's why the result
> > > shows almost the same execution time in the cases where nindexes <=
> > > parallel_degree.
> >
> > In the 16 indexes case, the performance saturated at 4 workers
> > which contradicts to your explanation.
> 
> Because the machine I used has 4 cores the performance doesn't get
> improved even if more than 4 parallel workers are launched.

That is what I mentioned in the cited phrases. Sorry for perhaps
hard-to-read phrases.. 

> >
> > > I'll share the performance test result of more larger tables and indexes.
> > >
> > > > I'm not sure, but what if we do index vacuum in one-tuple-by-one
> > > > manner? That is, heap vacuum passes dead tuple one-by-one (or
> > > > buffering few tuples) to workers and workers process it not by
> > > > bulkdelete, but just tuple_delete (we don't have one). That could
> > > > avoid the sleep time of heap-scan while index bulkdelete.
> > > >
> > >
> > > Just to be clear, in parallel lazy vacuum all parallel vacuum
> > > processes including the leader process do index vacuuming, no one
> > > doesn't sleep during index vacuuming. The leader process does heap
> > > scan and launches parallel workers before index vacuuming. Each
> > > processes exclusively processes indexes one by one.
> >
> > The leader doesn't continue heap-scan while index vacuuming is
> > running. And the index-page-scan seems eat up CPU easily. If
> > index vacuum can run simultaneously with the next heap scan
> > phase, we can make index scan finishes almost the same time with
> > the next round of heap scan. It would reduce the (possible) CPU
> > contention. But this requires as the twice size of shared
> > memoryas the current implement.
> 
> Yeah, I've considered that something like pipe-lining approach that
> one process continue to queue the dead tuples and other process
> fetches and processes them during index vacuuming but the current
> version patch employed the most simple approach as the first step.
> Once we had the retail index deletion approach we might be able to use
> it for parallel vacuum.

Ok, I understood the direction.

...
> > > Sorry I couldn't get your comment. You meant to move nprocessed to
> > > LVParallelState?
> >
> > Exactly. I meant letting lvshared points to private memory, but
> > it might introduce confusion.
> 
> Hmm, I'm not sure it would be a good idea. It would introduce
> confusion as you mentioned. And since 'nprocessed' have to be
> pg_atomic_uint32 in parallel mode we will end up with having an
> another branch.

Ok. Agreed. Thank you for the pacience.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [HACKERS] Block level parallel vacuum

From
Kyotaro HORIGUCHI
Date:
At Tue, 19 Mar 2019 17:51:32 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoCUZQmyXrwDw57ejoR-j1QrGqm_vrQKOkif_aJK4Gih6Q@mail.gmail.com>
> On Tue, Mar 19, 2019 at 10:39 AM Haribabu Kommi
> <kommi.haribabu@gmail.com> wrote:
> > The performance results are good. Do we want to add the recommended
> > size in the document for the parallel option? the parallel option for smaller
> > tables can lead to performance overhead.
> >
> 
> Hmm, I don't think we can add the specific recommended size because
> the performance gain by parallel lazy vacuum depends on various things
> such as CPU cores, the number of indexes, shared buffer size, index
> types, HDD or SSD. I suppose that users who want to use this option
> have some sort of performance problem such as that vacuum takes a very
> long time. They would use it for relatively larger tables.

Agree that we have no recommended setting, but I strongly think that documentation on the downside or possible side
effectof this feature is required for those who are to use the feature.
 

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, Mar 19, 2019 at 7:15 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>
> At Tue, 19 Mar 2019 19:01:06 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoA3PpkcNNzcQmiNgFL3DudhdLRWoTvQE6=kRagFLjUiBg@mail.gmail.com>
> > On Tue, Mar 19, 2019 at 4:59 PM Kyotaro HORIGUCHI
> > <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > >
> > > At Tue, 19 Mar 2019 13:31:04 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoD4ivrYqg5tau460zEEcgR0t9cV-UagjJ997OfvP3gsNQ@mail.gmail.com>
> > > > > For indexes=4,8,16, the cases with parallel_degree=4,8,16 behave
> > > > > almost the same. I suspect that the indexes are too-small and all
> > > > > the index pages were on memory and CPU is saturated. Maybe you
> > > > > had four cores and parallel workers more than the number had no
> > > > > effect.  Other normal backends should have been able do almost
> > > > > nothing meanwhile. Usually the number of parallel workers is
> > > > > determined so that IO capacity is filled up but this feature
> > > > > intermittently saturates CPU capacity very under such a
> > > > > situation.
> > > > >
> > > >
> > > > I'm sorry I didn't make it clear enough. If the parallel degree is
> > > > higher than 'the number of indexes - 1' redundant workers are not
> > > > launched. So for indexes=4, 8, 16 the number of actually launched
> > > > parallel workers is up to 3, 7, 15 respectively. That's why the result
> > > > shows almost the same execution time in the cases where nindexes <=
> > > > parallel_degree.
> > >
> > > In the 16 indexes case, the performance saturated at 4 workers
> > > which contradicts to your explanation.
> >
> > Because the machine I used has 4 cores the performance doesn't get
> > improved even if more than 4 parallel workers are launched.
>
> That is what I mentioned in the cited phrases. Sorry for perhaps
> hard-to-read phrases..

I understood now. Thank you!


Attached the updated version patches incorporated all review comments.

Commit 6776142 changed the grammar production of vacuum command. This
patch adds PARALLEL option on top of the commit.

I realized that the commit 6776142 breaks indents in ExecVacuum() and
the including nodes/parsenodes.h is no longer needed. Sorry that's my
wrong. Attached the patch (vacuum_fix.patch) fixes them, although the
indent issue will be resolved by pgindent before releasing.

In parsing vacuum command, since only PARALLEL option can have an
argument I've added the check in ExecVacuum to erroring out when other
options have an argument. But it might be good to make other vacuum
options (perhaps except for DISABLE_PAGE_SKIPPING option) accept an
argument just like EXPLAIN command.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, Mar 19, 2019 at 7:29 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>
> At Tue, 19 Mar 2019 17:51:32 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoCUZQmyXrwDw57ejoR-j1QrGqm_vrQKOkif_aJK4Gih6Q@mail.gmail.com>
> > On Tue, Mar 19, 2019 at 10:39 AM Haribabu Kommi
> > <kommi.haribabu@gmail.com> wrote:
> > > The performance results are good. Do we want to add the recommended
> > > size in the document for the parallel option? the parallel option for smaller
> > > tables can lead to performance overhead.
> > >
> >
> > Hmm, I don't think we can add the specific recommended size because
> > the performance gain by parallel lazy vacuum depends on various things
> > such as CPU cores, the number of indexes, shared buffer size, index
> > types, HDD or SSD. I suppose that users who want to use this option
> > have some sort of performance problem such as that vacuum takes a very
> > long time. They would use it for relatively larger tables.
>
> Agree that we have no recommended setting, but I strongly think that documentation on the downside or possible side
effectof this feature is required for those who are to use the feature.
 
>

I think that the side effect of parallel lazy vacuum would be to
consume more CPUs and I/O bandwidth, but which is also true for the
other utility command (i.e. parallel create index). The description of
max_parallel_maintenance_worker documents such things[1]. Anything
else to document?

[1] https://www.postgresql.org/docs/devel/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-ASYNC-BEHAVIOR

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

From
Sergei Kornilov
Date:
Hello

> * in_parallel is true if we're performing parallel lazy vacuum. Since any
> * updates are not allowed during parallel mode we don't update statistics
> * but set the index bulk-deletion result to *stats. Otherwise we update it
> * and set NULL.

lazy_cleanup_index has in_parallel argument only for this purpose, but caller still should check in_parallel after
lazy_cleanup_indexcall and do something else with stats for parallel execution.
 
Would be better always return stats and update statistics in caller? It's possible to update all index stats in
lazy_vacuum_all_indexesfor example? This routine is always parallel leader and has comment /* Do post-vacuum cleanup
andstatistics update for each index */ on for_cleanup=true call.
 

I think we need note in documentation that parallel leader is not counted in PARALLEL N option, so with PARALLEL 2
optionwe want use 3 processes. Or even change behavior? Default with PARALLEL 1 - only current backend in single
processis running, PARALLEL 2 - leader + one parallel worker, two processes works in parallel.
 

regards, Sergei


Re: [HACKERS] Block level parallel vacuum

From
Robert Haas
Date:
On Tue, Mar 19, 2019 at 3:59 AM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> The leader doesn't continue heap-scan while index vacuuming is
> running. And the index-page-scan seems eat up CPU easily. If
> index vacuum can run simultaneously with the next heap scan
> phase, we can make index scan finishes almost the same time with
> the next round of heap scan. It would reduce the (possible) CPU
> contention. But this requires as the twice size of shared
> memoryas the current implement.

I think you're approaching this from the wrong point of view.  If we
have a certain amount of memory available, is it better to (a) fill
the entire thing with dead tuples once, or (b) better to fill half of
it with dead tuples, start index vacuuming, and then fill the other
half of it with dead tuples for the next index-vacuum cycle while the
current one is running?  I think the answer is that (a) is clearly
better, because it results in half as many index vacuum cycles.

We can't really ask the user how much memory it's OK to use and then
use twice as much.  But if we could, what you're proposing here is
probably still not the right way to use it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Block level parallel vacuum

From
Robert Haas
Date:
On Tue, Mar 19, 2019 at 7:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> In parsing vacuum command, since only PARALLEL option can have an
> argument I've added the check in ExecVacuum to erroring out when other
> options have an argument. But it might be good to make other vacuum
> options (perhaps except for DISABLE_PAGE_SKIPPING option) accept an
> argument just like EXPLAIN command.

I think all of the existing options, including DISABLE_PAGE_SKIPPING,
should permit an argument that is passed to defGetBoolean().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Fri, Mar 22, 2019 at 4:53 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Mar 19, 2019 at 7:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > In parsing vacuum command, since only PARALLEL option can have an
> > argument I've added the check in ExecVacuum to erroring out when other
> > options have an argument. But it might be good to make other vacuum
> > options (perhaps except for DISABLE_PAGE_SKIPPING option) accept an
> > argument just like EXPLAIN command.
>
> I think all of the existing options, including DISABLE_PAGE_SKIPPING,
> should permit an argument that is passed to defGetBoolean().
>

Agreed. The attached 0001 patch changes so.

On Thu, Mar 21, 2019 at 8:05 PM Sergei Kornilov <sk@zsrv.org> wrote:
>
> Hello
>

Thank you for reviewing the patch!

> > * in_parallel is true if we're performing parallel lazy vacuum. Since any
> > * updates are not allowed during parallel mode we don't update statistics
> > * but set the index bulk-deletion result to *stats. Otherwise we update it
> > * and set NULL.
>
> lazy_cleanup_index has in_parallel argument only for this purpose, but caller still should check in_parallel after
lazy_cleanup_indexcall and do something else with stats for parallel execution. 
> Would be better always return stats and update statistics in caller? It's possible to update all index stats in
lazy_vacuum_all_indexesfor example? This routine is always parallel leader and has comment /* Do post-vacuum cleanup
andstatistics update for each index */ on for_cleanup=true call. 

Agreed. I've changed the patch so that we update index statistics in
lazy_vacuum_all_indexes().

>
> I think we need note in documentation that parallel leader is not counted in PARALLEL N option, so with PARALLEL 2
optionwe want use 3 processes. Or even change behavior? Default with PARALLEL 1 - only current backend in single
processis running, PARALLEL 2 - leader + one parallel worker, two processes works in parallel. 
>

Hmm, the documentation says "Perform vacuum index and cleanup index
phases of VACUUM in parallel using N background workers". Doesn't it
already explain that?

Attached the updated version patch. 0001 patch allows all existing
vacuum options an boolean argument. 0002 patch introduces parallel
lazy vacuum. 0003 patch adds -P (--parallel) option to vacuumdb
command.



Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Haribabu Kommi
Date:

On Fri, Mar 22, 2019 at 4:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Attached the updated version patch. 0001 patch allows all existing
vacuum options an boolean argument. 0002 patch introduces parallel
lazy vacuum. 0003 patch adds -P (--parallel) option to vacuumdb
command.

Thanks for sharing the updated patches.

0001 patch:

+    PARALLEL [ <replaceable class="parameter">N</replaceable> ]

But this patch contains syntax of PARALLEL but no explanation, I saw that
it is explained in 0002. It is not a problem, but just mentioning.

+      Specifies parallel degree for <literal>PARALLEL</literal> option. The
+      value must be at least 1. If the parallel degree
+      <replaceable class="parameter">integer</replaceable> is omitted, then
+      <command>VACUUM</command> decides the number of workers based on number of
+      indexes on the relation which further limited by
+      <xref linkend="guc-max-parallel-workers-maintenance"/>.

Can we add some more details about backend participation also, parallel workers will
come into picture only when there are 2 indexes in the table.

+ /*
+ * Do post-vacuum cleanup and statistics update for each index if
+ * we're not in parallel lazy vacuum. If in parallel lazy vacuum, do
+ * only post-vacum cleanup and then update statistics after exited
+ * from parallel mode.
+ */
+ lazy_vacuum_all_indexes(vacrelstats, Irel, nindexes, indstats,
+ lps, true);

How about renaming the above function, as it does the cleanup also?
lazy_vacuum_or_cleanup_all_indexes?


+ if (!IsInParallelVacuum(lps))
+ {
+ /*
+ * Update index statistics. If in parallel lazy vacuum, we will
+ * update them after exited from parallel mode.
+ */
+ lazy_update_index_statistics(Irel[idx], stats[idx]);
+
+ if (stats[idx])
+ pfree(stats[idx]);
+ }

The above check in lazy_vacuum_all_indexes can be combined it with the outer
if check where the memcpy is happening. I still feel that the logic around the stats
makes it little bit complex.

+ if (IsParallelWorker())
+ msg = "scanned index \"%s\" to remove %d row versions by parallel vacuum worker";
+ else
+ msg = "scanned index \"%s\" to remove %d row versions";

I feel, this way of error message may not be picked for the translations.
Is there any problem if we duplicate the entire ereport message with changed message?

+ for (i = 0; i < nindexes; i++)
+ {
+ LVIndStats *s = &(copied_indstats[i]);
+
+ if (s->updated)
+ lazy_update_index_statistics(Irel[i], &(s->stats));
+ }
+
+ pfree(copied_indstats);

why can't we use the shared memory directly to update the stats once all the workers
are finished, instead of copying them to a local memory?

+ tab->at_params.nworkers = 0; /* parallel lazy autovacuum is not supported */

User is not required to provide workers number compulsory even that parallel vacuum can
work, so just setting the above parameters doesn't stop the parallel workers, user must
pass the PARALLEL option also. So mentioning that also will be helpful later when we
start supporting it or some one who is reading the code can understand.

Regards,
Haribabu Kommi
Fujitsu Australia

Re: [HACKERS] Block level parallel vacuum

From
Kyotaro HORIGUCHI
Date:
Hello.

At Thu, 21 Mar 2019 15:51:40 -0400, Robert Haas <robertmhaas@gmail.com> wrote in
<CA+TgmobkRtLb5frmEF5t9U=d+iV9c5emtN+NrRS_xrHaH1Z20A@mail.gmail.com>
> On Tue, Mar 19, 2019 at 3:59 AM Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > The leader doesn't continue heap-scan while index vacuuming is
> > running. And the index-page-scan seems eat up CPU easily. If
> > index vacuum can run simultaneously with the next heap scan
> > phase, we can make index scan finishes almost the same time with
> > the next round of heap scan. It would reduce the (possible) CPU
> > contention. But this requires as the twice size of shared
> > memoryas the current implement.
> 
> I think you're approaching this from the wrong point of view.  If we
> have a certain amount of memory available, is it better to (a) fill
> the entire thing with dead tuples once, or (b) better to fill half of
> it with dead tuples, start index vacuuming, and then fill the other
> half of it with dead tuples for the next index-vacuum cycle while the
> current one is running?  I think the answer is that (a) is clearly

Sure.

> better, because it results in half as many index vacuum cycles.

The "problem" I see there is it stops heap scanning on the leader
process.  The leader cannot start the heap scan until the index
scan on workers end.

The heap scan is expected not to stop by the half-and-half
stratregy especially when the whole index pages are on
memory. But it is not always the case, of course.

> We can't really ask the user how much memory it's OK to use and then
> use twice as much.  But if we could, what you're proposing here is
> probably still not the right way to use it.

Yes. I thought that I wrote that with such implication. "requires
as the twice size" has negative implications as you wrote above.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [HACKERS] Block level parallel vacuum

From
Kyotaro HORIGUCHI
Date:
Hello. I forgot to mention a point.

At Fri, 22 Mar 2019 14:02:36 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoD7rqZPPyV7z4bku8Mn8AE2_kRdW1hTO4Lrsp+vn_U1kQ@mail.gmail.com>
> Attached the updated version patch. 0001 patch allows all existing
> vacuum options an boolean argument. 0002 patch introduces parallel
> lazy vacuum. 0003 patch adds -P (--parallel) option to vacuumdb
> command.

> +    if (IsParallelWorker())
> +        msg = "scanned index \"%s\" to remove %d row versions by parallel vacuum worker";
> +    else
> +        msg = "scanned index \"%s\" to remove %d row versions";
>      ereport(elevel,
> -            (errmsg("scanned index \"%s\" to remove %d row versions",
> +            (errmsg(msg,
>                      RelationGetRelationName(indrel),
> -                    vacrelstats->num_dead_tuples),
> +                    dead_tuples->num_tuples),

The msg prevents NLS from working. Please enclose the right-hand
literals by gettext_noop().

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [HACKERS] Block level parallel vacuum

From
Robert Haas
Date:
On Tue, Mar 26, 2019 at 10:31 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Thank you for reviewing the patch.

I don't think the approach in v20-0001 is quite right.

         if (strcmp(opt->defname, "verbose") == 0)
-            params.options |= VACOPT_VERBOSE;
+            params.options |= defGetBoolean(opt) ? VACOPT_VERBOSE : 0;

It seems to me that it would be better to do declare a separate
boolean for each flag at the top; e.g. bool verbose.  Then here do
verbose = defGetBoolean(opt).  And then after the loop do
params.options = (verbose ? VACOPT_VERBOSE : 0) | ... similarly for
other options.

The thing I don't like about the way you have it here is that it's not
going to work well for options that are true by default but can
optionally be set to false.  In that case, you would need to start
with the bit set and then clear it, but |= can only set bits, not
clear them.  I went and looked at the VACUUM (INDEX_CLEANUP) patch on
the other thread and it doesn't have any special handling for that
case, which makes me suspect that if you use that patch, the reloption
works as expected but VACUUM (INDEX_CLEANUP false) doesn't actually
succeed in disabling index cleanup.  The structure I suggested above
would fix that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Fri, Mar 29, 2019 at 4:53 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Mar 26, 2019 at 10:31 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > Thank you for reviewing the patch.
>
> I don't think the approach in v20-0001 is quite right.
>
>          if (strcmp(opt->defname, "verbose") == 0)
> -            params.options |= VACOPT_VERBOSE;
> +            params.options |= defGetBoolean(opt) ? VACOPT_VERBOSE : 0;
>
> It seems to me that it would be better to do declare a separate
> boolean for each flag at the top; e.g. bool verbose.  Then here do
> verbose = defGetBoolean(opt).  And then after the loop do
> params.options = (verbose ? VACOPT_VERBOSE : 0) | ... similarly for
> other options.
>
> The thing I don't like about the way you have it here is that it's not
> going to work well for options that are true by default but can
> optionally be set to false.  In that case, you would need to start
> with the bit set and then clear it, but |= can only set bits, not
> clear them.  I went and looked at the VACUUM (INDEX_CLEANUP) patch on
> the other thread and it doesn't have any special handling for that
> case, which makes me suspect that if you use that patch, the reloption
> works as expected but VACUUM (INDEX_CLEANUP false) doesn't actually
> succeed in disabling index cleanup.  The structure I suggested above
> would fix that.
>

You're right, the previous patches are wrong. Attached the updated
version patches.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Robert Haas
Date:
On Thu, Mar 28, 2019 at 10:27 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> You're right, the previous patches are wrong. Attached the updated
> version patches.

0001 looks good now.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Fri, Mar 29, 2019 at 9:28 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Mar 28, 2019 at 10:27 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > You're right, the previous patches are wrong. Attached the updated
> > version patches.
>
> 0001 looks good now.  Committed.
>

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Fri, Mar 29, 2019 at 11:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Mar 29, 2019 at 4:53 AM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Tue, Mar 26, 2019 at 10:31 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > Thank you for reviewing the patch.
> >
> > I don't think the approach in v20-0001 is quite right.
> >
> >          if (strcmp(opt->defname, "verbose") == 0)
> > -            params.options |= VACOPT_VERBOSE;
> > +            params.options |= defGetBoolean(opt) ? VACOPT_VERBOSE : 0;
> >
> > It seems to me that it would be better to do declare a separate
> > boolean for each flag at the top; e.g. bool verbose.  Then here do
> > verbose = defGetBoolean(opt).  And then after the loop do
> > params.options = (verbose ? VACOPT_VERBOSE : 0) | ... similarly for
> > other options.
> >
> > The thing I don't like about the way you have it here is that it's not
> > going to work well for options that are true by default but can
> > optionally be set to false.  In that case, you would need to start
> > with the bit set and then clear it, but |= can only set bits, not
> > clear them.  I went and looked at the VACUUM (INDEX_CLEANUP) patch on
> > the other thread and it doesn't have any special handling for that
> > case, which makes me suspect that if you use that patch, the reloption
> > works as expected but VACUUM (INDEX_CLEANUP false) doesn't actually
> > succeed in disabling index cleanup.  The structure I suggested above
> > would fix that.
> >
>
> You're right, the previous patches are wrong. Attached the updated
> version patches.
>

These patches conflict with the current HEAD. Attached the updated patches.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Robert Haas
Date:
On Thu, Apr 4, 2019 at 6:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> These patches conflict with the current HEAD. Attached the updated patches.

They'll need another rebase.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Fri, Apr 5, 2019 at 4:51 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Apr 4, 2019 at 6:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > These patches conflict with the current HEAD. Attached the updated patches.
>
> They'll need another rebase.
>

Thank you for the notice. Rebased.


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Kyotaro HORIGUCHI
Date:
Thank you for the rebased version.

At Fri, 5 Apr 2019 13:59:36 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoC_s0H0x-dDPhVJEqMYcnKYOMjESXd6r_9bbc3ZZegg1A@mail.gmail.com>
> Thank you for the notice. Rebased.

+    <term><replaceable class="parameter">integer</replaceable></term>
+    <listitem>
+     <para>
+      Specifies parallel degree for <literal>PARALLEL</literal> option. The
+      value must be at least 1. If the parallel degree
+      <replaceable class="parameter">integer</replaceable> is omitted, then
+      <command>VACUUM</command> decides the number of workers based on number of
+      indexes on the relation which further limited by
+      <xref linkend="guc-max-parallel-workers-maintenance"/>.
+     </para>
+    </listitem>
+   </varlistentry>

I'm quite confused to see this. I suppose the <para> should be a
description about <integer> parameters. Actually the existing
<boolean> entry is describing the boolean itself.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Fri, Apr 5, 2019 at 3:47 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>
> Thank you for the rebased version.
>
> At Fri, 5 Apr 2019 13:59:36 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoC_s0H0x-dDPhVJEqMYcnKYOMjESXd6r_9bbc3ZZegg1A@mail.gmail.com>
> > Thank you for the notice. Rebased.
>
> +    <term><replaceable class="parameter">integer</replaceable></term>
> +    <listitem>
> +     <para>
> +      Specifies parallel degree for <literal>PARALLEL</literal> option. The
> +      value must be at least 1. If the parallel degree
> +      <replaceable class="parameter">integer</replaceable> is omitted, then
> +      <command>VACUUM</command> decides the number of workers based on number of
> +      indexes on the relation which further limited by
> +      <xref linkend="guc-max-parallel-workers-maintenance"/>.
> +     </para>
> +    </listitem>
> +   </varlistentry>
>

Thank you for reviewing the patch.

> I'm quite confused to see this. I suppose the <para> should be a
> description about <integer> parameters. Actually the existing
> <boolean> entry is describing the boolean itself.
>

Indeed. How about the following description?

PARALLEL
Perform vacuum index and cleanup index phases of VACUUM in parallel
using integer background workers (for the detail of each vacuum
phases, please refer to Table 27.25). If the parallel degree integer
is omitted, then VACUUM decides the number of workers based on number
of indexes on the relation which further limited by
max_parallel_maintenance_workers. Only one worker can be used per
index. So parallel workers are launched only when there are at least 2
indexes in the table. Workers for vacuum are launched before starting
each phases and exit at the end of the phase. These behaviors might
change in a future release. This option can not use with FULL option.

integer
Specifies a positive integer value passed to the selected option. The
integer value can also be omitted, in which case the default value of
the selected option is used.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Fri, Apr 5, 2019 at 4:10 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Apr 5, 2019 at 3:47 PM Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> >
> > Thank you for the rebased version.
> >
> > At Fri, 5 Apr 2019 13:59:36 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoC_s0H0x-dDPhVJEqMYcnKYOMjESXd6r_9bbc3ZZegg1A@mail.gmail.com>
> > > Thank you for the notice. Rebased.
> >
> > +    <term><replaceable class="parameter">integer</replaceable></term>
> > +    <listitem>
> > +     <para>
> > +      Specifies parallel degree for <literal>PARALLEL</literal> option. The
> > +      value must be at least 1. If the parallel degree
> > +      <replaceable class="parameter">integer</replaceable> is omitted, then
> > +      <command>VACUUM</command> decides the number of workers based on number of
> > +      indexes on the relation which further limited by
> > +      <xref linkend="guc-max-parallel-workers-maintenance"/>.
> > +     </para>
> > +    </listitem>
> > +   </varlistentry>
> >
>
> Thank you for reviewing the patch.
>
> > I'm quite confused to see this. I suppose the <para> should be a
> > description about <integer> parameters. Actually the existing
> > <boolean> entry is describing the boolean itself.
> >
>
> Indeed. How about the following description?
>

Attached the updated version patches.
Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Kyotaro HORIGUCHI
Date:
Hello.

# Is this still living? I changed the status to "needs review"

At Sat, 6 Apr 2019 06:47:32 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoAuD3txrxucnVtM6NGo=JGSjs3VDkoCzN0jGz_egc_82g@mail.gmail.com>
> > Indeed. How about the following description?
> >
> 
> Attached the updated version patches.

Thanks.

heapam.h is including access/parallel.h but the file doesn't use
parallel.h stuff and storage/shm_toc.h and storage/dsm.h are
enough.

+ * DSM keys for parallel lazy vacuum. Since we don't need to worry about DSM
+ * keys conflicting with plan_node_id we can use small integers.

Yeah, this is right, but "plan_node_id" seems abrupt
there. Please prepend "differently from parallel execution code"
or .. I think no excuse is needed to use that numbers. The
executor code is already making an excuse for the large numbers
as unusual instead.

+ * Macro to check if we in a parallel lazy vacuum. If true, we're in parallel
+ * mode and prepared the DSM segments.
+ */
+#define IsInParallelVacuum(lps) (((LVParallelState *) (lps)) != NULL)

we *are* in?

The name "IsInParallleVacuum()" looks (to me) like suggesting
"this process is a parallel vacuum worker".  How about
ParallelVacuumIsActive?


+typedef struct LVIndStats
+typedef struct LVDeadTuples
+typedef struct LVShared
+typedef struct LVParallelState

The names are confusing, and the name LVShared is too
generic. Shared-only structs are better to be marked in the name.
That is, maybe it would be better that LVIndStats were
LVSharedIndStats and LVShared were LVSharedRelStats.

It might be better that LVIndStats were moved out from LVShared,
but I'm not confident.

+static void
+lazy_parallel_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats, Relation *Irel
...
+    lazy_begin_parallel_index_vacuum(lps, vacrelstats, for_cleanup);
...
+    do_parallel_vacuum_or_cleanup_indexes(Irel, nindexes, stats,
+                                  lps->lvshared, vacrelstats->dead_tuples);
...
+    lazy_end_parallel_index_vacuum(lps, !for_cleanup);

The function takes the parameter for_cleanup, but the flag is
used by the three subfunctions in utterly ununified way. It seems
to me useless to store for_cleanup in lvshared and lazy_end is
rather confusing. There's no explanation why "reinitialization"
== "!for_cleanup". In the first place,
lazy_begin_parallel_index_vacuum and
lazy_end_parallel_index_vacuum are called only from the function
and rather short so it doesn't seem reasonable that the are
independend functions.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Mon, Apr 8, 2019 at 7:25 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>
> Hello.
>
> # Is this still living? I changed the status to "needs review"
>
> At Sat, 6 Apr 2019 06:47:32 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoAuD3txrxucnVtM6NGo=JGSjs3VDkoCzN0jGz_egc_82g@mail.gmail.com>
> > > Indeed. How about the following description?
> > >
> >
> > Attached the updated version patches.
>
> Thanks.
>

Thank you for reviewing the patch!

> heapam.h is including access/parallel.h but the file doesn't use
> parallel.h stuff and storage/shm_toc.h and storage/dsm.h are
> enough.

Fixed.

>
> + * DSM keys for parallel lazy vacuum. Since we don't need to worry about DSM
> + * keys conflicting with plan_node_id we can use small integers.
>
> Yeah, this is right, but "plan_node_id" seems abrupt
> there. Please prepend "differently from parallel execution code"
> or .. I think no excuse is needed to use that numbers. The
> executor code is already making an excuse for the large numbers
> as unusual instead.

Fixed.

>
> + * Macro to check if we in a parallel lazy vacuum. If true, we're in parallel
> + * mode and prepared the DSM segments.
> + */
> +#define IsInParallelVacuum(lps) (((LVParallelState *) (lps)) != NULL)
>
> we *are* in?

Fixed.

>
> The name "IsInParallleVacuum()" looks (to me) like suggesting
> "this process is a parallel vacuum worker".  How about
> ParallelVacuumIsActive?

Fixed.

>
>
> +typedef struct LVIndStats
> +typedef struct LVDeadTuples
> +typedef struct LVShared
> +typedef struct LVParallelState
>
> The names are confusing, and the name LVShared is too
> generic. Shared-only structs are better to be marked in the name.
> That is, maybe it would be better that LVIndStats were
> LVSharedIndStats and LVShared were LVSharedRelStats.

Hmm, LVShared actually stores also various things that are not
relevant with the relation. I'm not sure that's a good idea to rename
it to LVSharedRelStats. When we support parallel vacuum for other
vacuum steps the adding a struct for storing only relation statistics
might work well.

>
> It might be better that LVIndStats were moved out from LVShared,
> but I'm not confident.
>
> +static void
> +lazy_parallel_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats, Relation *Irel
> ...
> +       lazy_begin_parallel_index_vacuum(lps, vacrelstats, for_cleanup);
> ...
> +       do_parallel_vacuum_or_cleanup_indexes(Irel, nindexes, stats,
> +                                  lps->lvshared, vacrelstats->dead_tuples);
> ...
> +       lazy_end_parallel_index_vacuum(lps, !for_cleanup);
>
> The function takes the parameter for_cleanup, but the flag is
> used by the three subfunctions in utterly ununified way. It seems
> to me useless to store for_cleanup in lvshared

I think that we need to store for_cleanup or a something telling
vacuum workers to do either index vacuuming or index cleanup in
lvshared. Or can we use another thing instead?

>  and lazy_end is
> rather confusing.

Ah, I used "lazy" as prefix of function in vacuumlazy.c. Fixed.

> There's no explanation why "reinitialization"
> == "!for_cleanup".  In the first place,
> lazy_begin_parallel_index_vacuum and
> lazy_end_parallel_index_vacuum are called only from the function
> and rather short so it doesn't seem reasonable that the are
> independend functions.

Okay agreed, fixed.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Wed, Apr 10, 2019 at 2:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Apr 8, 2019 at 7:25 PM Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> >
> > Hello.
> >
> > # Is this still living? I changed the status to "needs review"
> >
> > At Sat, 6 Apr 2019 06:47:32 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoAuD3txrxucnVtM6NGo=JGSjs3VDkoCzN0jGz_egc_82g@mail.gmail.com>
> > > > Indeed. How about the following description?
> > > >
> > >
> > > Attached the updated version patches.
> >
> > Thanks.
> >
>
> Thank you for reviewing the patch!
>
> > heapam.h is including access/parallel.h but the file doesn't use
> > parallel.h stuff and storage/shm_toc.h and storage/dsm.h are
> > enough.
>
> Fixed.
>
> >
> > + * DSM keys for parallel lazy vacuum. Since we don't need to worry about DSM
> > + * keys conflicting with plan_node_id we can use small integers.
> >
> > Yeah, this is right, but "plan_node_id" seems abrupt
> > there. Please prepend "differently from parallel execution code"
> > or .. I think no excuse is needed to use that numbers. The
> > executor code is already making an excuse for the large numbers
> > as unusual instead.
>
> Fixed.
>
> >
> > + * Macro to check if we in a parallel lazy vacuum. If true, we're in parallel
> > + * mode and prepared the DSM segments.
> > + */
> > +#define IsInParallelVacuum(lps) (((LVParallelState *) (lps)) != NULL)
> >
> > we *are* in?
>
> Fixed.
>
> >
> > The name "IsInParallleVacuum()" looks (to me) like suggesting
> > "this process is a parallel vacuum worker".  How about
> > ParallelVacuumIsActive?
>
> Fixed.
>
> >
> >
> > +typedef struct LVIndStats
> > +typedef struct LVDeadTuples
> > +typedef struct LVShared
> > +typedef struct LVParallelState
> >
> > The names are confusing, and the name LVShared is too
> > generic. Shared-only structs are better to be marked in the name.
> > That is, maybe it would be better that LVIndStats were
> > LVSharedIndStats and LVShared were LVSharedRelStats.
>
> Hmm, LVShared actually stores also various things that are not
> relevant with the relation. I'm not sure that's a good idea to rename
> it to LVSharedRelStats. When we support parallel vacuum for other
> vacuum steps the adding a struct for storing only relation statistics
> might work well.
>
> >
> > It might be better that LVIndStats were moved out from LVShared,
> > but I'm not confident.
> >
> > +static void
> > +lazy_parallel_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats, Relation *Irel
> > ...
> > +       lazy_begin_parallel_index_vacuum(lps, vacrelstats, for_cleanup);
> > ...
> > +       do_parallel_vacuum_or_cleanup_indexes(Irel, nindexes, stats,
> > +                                  lps->lvshared, vacrelstats->dead_tuples);
> > ...
> > +       lazy_end_parallel_index_vacuum(lps, !for_cleanup);
> >
> > The function takes the parameter for_cleanup, but the flag is
> > used by the three subfunctions in utterly ununified way. It seems
> > to me useless to store for_cleanup in lvshared
>
> I think that we need to store for_cleanup or a something telling
> vacuum workers to do either index vacuuming or index cleanup in
> lvshared. Or can we use another thing instead?
>
> >  and lazy_end is
> > rather confusing.
>
> Ah, I used "lazy" as prefix of function in vacuumlazy.c. Fixed.
>
> > There's no explanation why "reinitialization"
> > == "!for_cleanup".  In the first place,
> > lazy_begin_parallel_index_vacuum and
> > lazy_end_parallel_index_vacuum are called only from the function
> > and rather short so it doesn't seem reasonable that the are
> > independend functions.
>
> Okay agreed, fixed.
>

Since the previous version patch conflicts with current HEAD, I've
attached the updated version patches.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Sergei Kornilov
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            not tested

Hello

I reviewed v25 patches and have just a few notes.

missed synopsis for "PARALLEL" option (<synopsis> block in doc/src/sgml/ref/vacuum.sgml )
missed prototype for vacuum_log_cleanup_info in "non-export function prototypes"

>    /*
>     * Do post-vacuum cleanup, and statistics update for each index if
>     * we're not in parallel lazy vacuum. If in parallel lazy vacuum, do
>     * only post-vacum cleanup and update statistics at the end of parallel
>     * lazy vacuum.
>     */
>    if (vacrelstats->useindex)
>        lazy_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
>                                       indstats, lps, true);
>
>    if (ParallelVacuumIsActive(lps))
>    {
>        /* End parallel mode and update index statistics */
>        end_parallel_vacuum(lps, Irel, nindexes);
>    }

I personally do not like update statistics in different places.
Can we change lazy_vacuum_or_cleanup_indexes to writing stats for both parallel and non-parallel cases? I means
somethinglike this:
 

>    if (ParallelVacuumIsActive(lps))
>    {
>        /* Do parallel index vacuuming or index cleanup */
>        lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel,
>                                                nindexes, stats,
>                                                lps, for_cleanup);
>        if (for_cleanup)
>        {
>            ...
>            for (i = 0; i < nindexes; i++)
>                lazy_update_index_statistics(...);
>        }
>        return;
>    }

So all lazy_update_index_statistics would be in one place. lazy_parallel_vacuum_or_cleanup_indexes is called only from
parallelleader and waits for all workers. Possible we can update stats in lazy_parallel_vacuum_or_cleanup_indexes after
WaitForParallelWorkersToFinishcall.
 

Also discussion question: vacuumdb parameters --parallel= and --jobs= will confuse users? We need more description for
thisoptions?
 

regards, Sergei

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Fri, Jun 7, 2019 at 12:03 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Since the previous version patch conflicts with current HEAD, I've
> attached the updated version patches.
>

Review comments:
------------------------------
*
      indexes on the relation which further limited by
+      <xref linkend="guc-max-parallel-workers-maintenance"/>.

/which further/which is further

*
+ * index vacuuming or index cleanup, we launch parallel worker processes. Once
+ * all indexes are processed the parallel worker processes exit and the leader
+ * process re-initializes the DSM segment while keeping recorded dead tuples.

It is not clear for this comment why it re-initializes the DSM segment
instead of destroying it once the index work is done by workers.  Can
you elaborate a bit more in the comment?

*
+ * Note that all parallel workers live during one either index vacuuming or

It seems usage of 'one' is not required in the above sentence.

*
+
+/*
+ * Compute the number of parallel worker process to request.

/process/processes

*
+static int
+compute_parallel_workers(Relation onerel, int nrequested, int nindexes)
+{
+ int parallel_workers = 0;
+
+ Assert(nrequested >= 0);
+
+ if (nindexes <= 1)
+ return 0;

I think here, in the beginning, you can also check if
max_parallel_maintenance_workers are 0, then return.

*
In function compute_parallel_workers, don't we want to cap the number
of workers based on maintenance_work_mem as we do in
plan_create_index_workers?

The basic point is how do we want to treat maintenance_work_mem for
this feature.  Do we want all workers to use at max the
maintenance_work_mem or each worker is allowed to use
maintenance_work_mem?  I would prefer earlier unless we have good
reason to follow the later strategy.

Accordingly, we might need to update the below paragraph in docs:
"Note that parallel utility commands should not consume substantially
more memory than equivalent non-parallel operations.  This strategy
differs from that of parallel query, where resource limits generally
apply per worker process.  Parallel utility commands treat the
resource limit <varname>maintenance_work_mem</varname> as a limit to
be applied to the entire utility command, regardless of the number of
parallel worker processes."

*
+static int
+compute_parallel_workers(Relation onerel, int nrequested, int nindexes)
+{
+ int parallel_workers = 0;
+
+ Assert(nrequested >= 0);
+
+ if (nindexes <= 1)
+ return 0;
+
+ if (nrequested > 0)
+ {
+ /* At least one index is taken by the leader process */
+ parallel_workers = Min(nrequested, nindexes - 1);
+ }

I think here we always allow the leader to participate.  It seems to
me we have some way to disable leader participation.  During the
development of previous parallel operations, we find it quite handy to
catch bugs. We might want to mimic what has been done for index with
DISABLE_LEADER_PARTICIPATION.

*
+/*
+ * DSM keys for parallel lazy vacuum. Unlike other parallel execution code,
+ * since we don't need to worry about DSM keys conflicting with plan_node_id
+ * we can use small integers.
+ */
+#define PARALLEL_VACUUM_KEY_SHARED 1
+#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2
+#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3

I think it would be better if these keys should be assigned numbers in
a way we do for other similar operation like create index.  See below
defines
in code:
/* Magic numbers for parallel state sharing */
#define PARALLEL_KEY_BTREE_SHARED UINT64CONST(0xA000000000000001)

This will make the code consistent with other parallel operations.

*
+begin_parallel_vacuum(LVRelStats *vacrelstats, Oid relid, BlockNumber nblocks,
+   int nindexes, int nrequested)
{
..
+ est_deadtuples = MAXALIGN(add_size(sizeof(LVDeadTuples),
..
}

I think here you should use SizeOfLVDeadTuples as defined by patch.

*
+ keys++;
+
+ /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
+ maxtuples = compute_max_dead_tuples(nblocks, true);
+ est_deadtuples = MAXALIGN(add_size(sizeof(LVDeadTuples),
+    mul_size(sizeof(ItemPointerData), maxtuples)));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
+ keys++;
+
+ shm_toc_estimate_keys(&pcxt->estimator, keys);
+
+ /* Finally, estimate VACUUM_KEY_QUERY_TEXT space */
+ querylen = strlen(debug_query_string);
+ shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);

The code style looks inconsistent here.  In some cases, you are
calling shm_toc_estimate_keys immediately after shm_toc_estimate_chunk
and in other cases, you are accumulating keys.  I think it is better
to call shm_toc_estimate_keys immediately after shm_toc_estimate_chunk
in all cases.

*
+void
+heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
{
..
+ /* Set debug_query_string for individual workers */
+ sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
..
}

I think the last parameter in shm_toc_lookup should be false.  Is
there a reason for passing it as true?

*
+void
+heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
+{
..
+ /* Open table */
+ onerel = heap_open(lvshared->relid, ShareUpdateExclusiveLock);
..
}

I don't think it is a good idea to assume the lock mode as
ShareUpdateExclusiveLock here.  Tomorrow, if due to some reason there
is a change in lock level for the vacuum process, we might forget to
update it here.  I think it is better if we can get this information
from the master backend.

*
+end_parallel_vacuum(LVParallelState *lps, Relation *Irel, int nindexes)
{
..
+ /* Shutdown worker processes and destroy the parallel context */
+ WaitForParallelWorkersToFinish(lps->pcxt);
..
}

Do we really need to call WaitForParallelWorkersToFinish here as it
must have been called in lazy_parallel_vacuum_or_cleanup_indexes
before this time?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Sat, Sep 21, 2019 at 6:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Jun 7, 2019 at 12:03 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Since the previous version patch conflicts with current HEAD, I've
> attached the updated version patches.
>

Review comments:
------------------------------

Sawada-San, are you planning to work on the review comments?  I can take care of this and then proceed with further review if you are tied up with something else.
 
*
+/*
+ * DSM keys for parallel lazy vacuum. Unlike other parallel execution code,
+ * since we don't need to worry about DSM keys conflicting with plan_node_id
+ * we can use small integers.
+ */
+#define PARALLEL_VACUUM_KEY_SHARED 1
+#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2
+#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3

I think it would be better if these keys should be assigned numbers in
a way we do for other similar operation like create index.  See below
defines
in code:
/* Magic numbers for parallel state sharing */
#define PARALLEL_KEY_BTREE_SHARED UINT64CONST(0xA000000000000001)

This will make the code consistent with other parallel operations.

I think we don't need to handle this comment.  Today, I read the other emails in the thread and noticed that you have done this based on comment by Robert and that decision seems wise to me.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, Oct 1, 2019 at 10:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Sep 21, 2019 at 6:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Fri, Jun 7, 2019 at 12:03 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> >
>> > Since the previous version patch conflicts with current HEAD, I've
>> > attached the updated version patches.
>> >
>>
>> Review comments:
>> ------------------------------
>
>
> Sawada-San, are you planning to work on the review comments?  I can take care of this and then proceed with further
reviewif you are tied up with something else.
 
>

Thank you for reviewing this patch.

Yes I'm addressing your comments and will submit the updated patch soon.

> I think we don't need to handle this comment.  Today, I read the other emails in the thread and noticed that you have
donethis based on comment by Robert and that decision seems wise to me.
 

Understood.

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Sat, Sep 21, 2019 at 9:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jun 7, 2019 at 12:03 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > Since the previous version patch conflicts with current HEAD, I've
> > attached the updated version patches.
> >
>

Thank you for reviewing this patch!

> Review comments:
> ------------------------------
> *
>       indexes on the relation which further limited by
> +      <xref linkend="guc-max-parallel-workers-maintenance"/>.
>
> /which further/which is further
>

Fixed.

> *
> + * index vacuuming or index cleanup, we launch parallel worker processes. Once
> + * all indexes are processed the parallel worker processes exit and the leader
> + * process re-initializes the DSM segment while keeping recorded dead tuples.
>
> It is not clear for this comment why it re-initializes the DSM segment
> instead of destroying it once the index work is done by workers.  Can
> you elaborate a bit more in the comment?

Added more explanation.

>
> *
> + * Note that all parallel workers live during one either index vacuuming or
>
> It seems usage of 'one' is not required in the above sentence.

Removed.

>
> *
> +
> +/*
> + * Compute the number of parallel worker process to request.
>
> /process/processes

Fixed.

>
> *
> +static int
> +compute_parallel_workers(Relation onerel, int nrequested, int nindexes)
> +{
> + int parallel_workers = 0;
> +
> + Assert(nrequested >= 0);
> +
> + if (nindexes <= 1)
> + return 0;
>
> I think here, in the beginning, you can also check if
> max_parallel_maintenance_workers are 0, then return.
>

Agreed, fixed.

> *
> In function compute_parallel_workers, don't we want to cap the number
> of workers based on maintenance_work_mem as we do in
> plan_create_index_workers?
>
> The basic point is how do we want to treat maintenance_work_mem for
> this feature.  Do we want all workers to use at max the
> maintenance_work_mem or each worker is allowed to use
> maintenance_work_mem?  I would prefer earlier unless we have good
> reason to follow the later strategy.
>
> Accordingly, we might need to update the below paragraph in docs:
> "Note that parallel utility commands should not consume substantially
> more memory than equivalent non-parallel operations.  This strategy
> differs from that of parallel query, where resource limits generally
> apply per worker process.  Parallel utility commands treat the
> resource limit <varname>maintenance_work_mem</varname> as a limit to
> be applied to the entire utility command, regardless of the number of
> parallel worker processes."

I'd also prefer to use maintenance_work_mem at max during parallel
vacuum regardless of the number of parallel workers. This is current
implementation. In lazy vacuum the maintenance_work_mem is used to
record itempointer of dead tuples. This is done by leader process and
worker processes just refers them for vacuuming dead index tuples.
Even if user sets a small amount of maintenance_work_mem the parallel
vacuum would be helpful as it still would take a time for index
vacuuming. So I thought we should cap the number of parallel workers
by the number of indexes rather than maintenance_work_mem.

>
> *
> +static int
> +compute_parallel_workers(Relation onerel, int nrequested, int nindexes)
> +{
> + int parallel_workers = 0;
> +
> + Assert(nrequested >= 0);
> +
> + if (nindexes <= 1)
> + return 0;
> +
> + if (nrequested > 0)
> + {
> + /* At least one index is taken by the leader process */
> + parallel_workers = Min(nrequested, nindexes - 1);
> + }
>
> I think here we always allow the leader to participate.  It seems to
> me we have some way to disable leader participation.  During the
> development of previous parallel operations, we find it quite handy to
> catch bugs. We might want to mimic what has been done for index with
> DISABLE_LEADER_PARTICIPATION.

Added the way to disable leader participation.

>
> *
> +/*
> + * DSM keys for parallel lazy vacuum. Unlike other parallel execution code,
> + * since we don't need to worry about DSM keys conflicting with plan_node_id
> + * we can use small integers.
> + */
> +#define PARALLEL_VACUUM_KEY_SHARED 1
> +#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2
> +#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
>
> I think it would be better if these keys should be assigned numbers in
> a way we do for other similar operation like create index.  See below
> defines
> in code:
> /* Magic numbers for parallel state sharing */
> #define PARALLEL_KEY_BTREE_SHARED UINT64CONST(0xA000000000000001)
>
> This will make the code consistent with other parallel operations.

I skipped this comment according to the previous your mail.

>
> *
> +begin_parallel_vacuum(LVRelStats *vacrelstats, Oid relid, BlockNumber nblocks,
> +   int nindexes, int nrequested)
> {
> ..
> + est_deadtuples = MAXALIGN(add_size(sizeof(LVDeadTuples),
> ..
> }
>
> I think here you should use SizeOfLVDeadTuples as defined by patch.

Fixed.

>
> *
> + keys++;
> +
> + /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
> + maxtuples = compute_max_dead_tuples(nblocks, true);
> + est_deadtuples = MAXALIGN(add_size(sizeof(LVDeadTuples),
> +    mul_size(sizeof(ItemPointerData), maxtuples)));
> + shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
> + keys++;
> +
> + shm_toc_estimate_keys(&pcxt->estimator, keys);
> +
> + /* Finally, estimate VACUUM_KEY_QUERY_TEXT space */
> + querylen = strlen(debug_query_string);
> + shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
> + shm_toc_estimate_keys(&pcxt->estimator, 1);
>
> The code style looks inconsistent here.  In some cases, you are
> calling shm_toc_estimate_keys immediately after shm_toc_estimate_chunk
> and in other cases, you are accumulating keys.  I think it is better
> to call shm_toc_estimate_keys immediately after shm_toc_estimate_chunk
> in all cases.

Fixed. But there are some code that call shm_toc_estimate_keys for
multiple keys in for example nbtsort.c and parallel.c. What is the
difference?

>
> *
> +void
> +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
> {
> ..
> + /* Set debug_query_string for individual workers */
> + sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
> ..
> }
>
> I think the last parameter in shm_toc_lookup should be false.  Is
> there a reason for passing it as true?

My bad, fixed.

>
> *
> +void
> +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
> +{
> ..
> + /* Open table */
> + onerel = heap_open(lvshared->relid, ShareUpdateExclusiveLock);
> ..
> }
>
> I don't think it is a good idea to assume the lock mode as
> ShareUpdateExclusiveLock here.  Tomorrow, if due to some reason there
> is a change in lock level for the vacuum process, we might forget to
> update it here.  I think it is better if we can get this information
> from the master backend.

So did you mean to declare the lock mode for lazy vacuum somewhere as
a global variable and use it in both try_relation_open in the leader
process and relation_open in the worker process? Otherwise we would
end up with adding something like shared->lmode =
ShareUpdateExclusiveLock during parallel context initialization, which
seems not to resolve your concern.

>
> *
> +end_parallel_vacuum(LVParallelState *lps, Relation *Irel, int nindexes)
> {
> ..
> + /* Shutdown worker processes and destroy the parallel context */
> + WaitForParallelWorkersToFinish(lps->pcxt);
> ..
> }
>
> Do we really need to call WaitForParallelWorkersToFinish here as it
> must have been called in lazy_parallel_vacuum_or_cleanup_indexes
> before this time?

No, removed.

I've attached the updated version patch that incorporated your
comments excluding some comments that needs more discussion. After
discussion I'll update it again.

Regards,

--
Masahiko Sawada

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
I have started reviewing this patch and I have some cosmetic comments.
I will continue the review tomorrow.

+This change adds PARALLEL option to VACUUM command that enable us to
+perform index vacuuming and index cleanup with background
+workers. Indivisual

/s/Indivisual/Individual/

+ * parallel worker processes. Individual indexes is processed by one vacuum
+ * process. At beginning of lazy vacuum (at lazy_scan_heap) we prepare the

/s/Individual indexes is processed/Individual indexes are processed/
/s/At beginning/ At the beginning

+ * parallel workers. In parallel lazy vacuum, we enter parallel mode and
+ * create the parallel context and the DSM segment before starting heap
+ * scan.

Can we extend the comment to explain why we do that before starting
the heap scan?

+ else
+ {
+ if (for_cleanup)
+ {
+ if (lps->nworkers_requested > 0)
+ appendStringInfo(&buf,
+ ngettext("launched %d parallel vacuum worker for index cleanup
(planned: %d, requested %d)",
+   "launched %d parallel vacuum workers for index cleanup (planned:
%d, requsted %d)",
+   lps->pcxt->nworkers_launched),
+ lps->pcxt->nworkers_launched,
+ lps->pcxt->nworkers,
+ lps->nworkers_requested);
+ else
+ appendStringInfo(&buf,
+ ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
+   "launched %d parallel vacuum workers for index cleanup (planned: %d)",
+   lps->pcxt->nworkers_launched),
+ lps->pcxt->nworkers_launched,
+ lps->pcxt->nworkers);
+ }
+ else
+ {
+ if (lps->nworkers_requested > 0)
+ appendStringInfo(&buf,
+ ngettext("launched %d parallel vacuum worker for index vacuuming
(planned: %d, requested %d)",
+   "launched %d parallel vacuum workers for index vacuuming (planned:
%d, requested %d)",
+   lps->pcxt->nworkers_launched),
+ lps->pcxt->nworkers_launched,
+ lps->pcxt->nworkers,
+ lps->nworkers_requested);
+ else
+ appendStringInfo(&buf,
+ ngettext("launched %d parallel vacuum worker for index vacuuming
(planned: %d)",
+   "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
+   lps->pcxt->nworkers_launched),
+ lps->pcxt->nworkers_launched,
+ lps->pcxt->nworkers);
+ }

Multiple places I see a lot of duplicate code for for_cleanup is true
or false.  The only difference is in the error message whether we give
index cleanup or index vacuuming otherwise complete code is the same
for
both the cases.  Can't we create some string and based on the value of
the for_cleanup and append it in the error message that way we can
avoid duplicating this at many places?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Thu, Oct 3, 2019 at 9:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> I have started reviewing this patch and I have some cosmetic comments.
> I will continue the review tomorrow.
>

Thank you for reviewing the patch!

> +This change adds PARALLEL option to VACUUM command that enable us to
> +perform index vacuuming and index cleanup with background
> +workers. Indivisual
>
> /s/Indivisual/Individual/

Fixed.

>
> + * parallel worker processes. Individual indexes is processed by one vacuum
> + * process. At beginning of lazy vacuum (at lazy_scan_heap) we prepare the
>
> /s/Individual indexes is processed/Individual indexes are processed/
> /s/At beginning/ At the beginning

Fixed.

>
> + * parallel workers. In parallel lazy vacuum, we enter parallel mode and
> + * create the parallel context and the DSM segment before starting heap
> + * scan.
>
> Can we extend the comment to explain why we do that before starting
> the heap scan?

Added more comment.

>
> + else
> + {
> + if (for_cleanup)
> + {
> + if (lps->nworkers_requested > 0)
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker for index cleanup
> (planned: %d, requested %d)",
> +   "launched %d parallel vacuum workers for index cleanup (planned:
> %d, requsted %d)",
> +   lps->pcxt->nworkers_launched),
> + lps->pcxt->nworkers_launched,
> + lps->pcxt->nworkers,
> + lps->nworkers_requested);
> + else
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
> +   "launched %d parallel vacuum workers for index cleanup (planned: %d)",
> +   lps->pcxt->nworkers_launched),
> + lps->pcxt->nworkers_launched,
> + lps->pcxt->nworkers);
> + }
> + else
> + {
> + if (lps->nworkers_requested > 0)
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker for index vacuuming
> (planned: %d, requested %d)",
> +   "launched %d parallel vacuum workers for index vacuuming (planned:
> %d, requested %d)",
> +   lps->pcxt->nworkers_launched),
> + lps->pcxt->nworkers_launched,
> + lps->pcxt->nworkers,
> + lps->nworkers_requested);
> + else
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker for index vacuuming
> (planned: %d)",
> +   "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
> +   lps->pcxt->nworkers_launched),
> + lps->pcxt->nworkers_launched,
> + lps->pcxt->nworkers);
> + }
>
> Multiple places I see a lot of duplicate code for for_cleanup is true
> or false.  The only difference is in the error message whether we give
> index cleanup or index vacuuming otherwise complete code is the same
> for
> both the cases.  Can't we create some string and based on the value of
> the for_cleanup and append it in the error message that way we can
> avoid duplicating this at many places?

I think it's necessary for translation. IIUC if we construct the
message it cannot be translated.

Attached the updated patch.

Regards,

--
Masahiko Sawada

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Sat, Sep 21, 2019 at 9:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> *
> In function compute_parallel_workers, don't we want to cap the number
> of workers based on maintenance_work_mem as we do in
> plan_create_index_workers?
>
> The basic point is how do we want to treat maintenance_work_mem for
> this feature.  Do we want all workers to use at max the
> maintenance_work_mem or each worker is allowed to use
> maintenance_work_mem?  I would prefer earlier unless we have good
> reason to follow the later strategy.
>
> Accordingly, we might need to update the below paragraph in docs:
> "Note that parallel utility commands should not consume substantially
> more memory than equivalent non-parallel operations.  This strategy
> differs from that of parallel query, where resource limits generally
> apply per worker process.  Parallel utility commands treat the
> resource limit <varname>maintenance_work_mem</varname> as a limit to
> be applied to the entire utility command, regardless of the number of
> parallel worker processes."

I'd also prefer to use maintenance_work_mem at max during parallel
vacuum regardless of the number of parallel workers. This is current
implementation. In lazy vacuum the maintenance_work_mem is used to
record itempointer of dead tuples. This is done by leader process and
worker processes just refers them for vacuuming dead index tuples.
Even if user sets a small amount of maintenance_work_mem the parallel
vacuum would be helpful as it still would take a time for index
vacuuming. So I thought we should cap the number of parallel workers
by the number of indexes rather than maintenance_work_mem.


Isn't that true only if we never use maintenance_work_mem during index cleanup?  However, I think we are using during index cleanup, see forex. ginInsertCleanup.  I think before reaching any conclusion about what to do about this, first we need to establish whether this is a problem.  If I am correct, then only some of the index cleanups (like gin index) use maintenance_work_mem, so we need to consider that point while designing a solution for this.
 
> *
> + keys++;
> +
> + /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
> + maxtuples = compute_max_dead_tuples(nblocks, true);
> + est_deadtuples = MAXALIGN(add_size(sizeof(LVDeadTuples),
> +    mul_size(sizeof(ItemPointerData), maxtuples)));
> + shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
> + keys++;
> +
> + shm_toc_estimate_keys(&pcxt->estimator, keys);
> +
> + /* Finally, estimate VACUUM_KEY_QUERY_TEXT space */
> + querylen = strlen(debug_query_string);
> + shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
> + shm_toc_estimate_keys(&pcxt->estimator, 1);
>
> The code style looks inconsistent here.  In some cases, you are
> calling shm_toc_estimate_keys immediately after shm_toc_estimate_chunk
> and in other cases, you are accumulating keys.  I think it is better
> to call shm_toc_estimate_keys immediately after shm_toc_estimate_chunk
> in all cases.

Fixed. But there are some code that call shm_toc_estimate_keys for
multiple keys in for example nbtsort.c and parallel.c. What is the
difference?


We can do it, either way, depending on the situation.  For example, in nbtsort.c, there is an if check based on which 'number of keys' can vary.  I think here we should try to write in a way that it should not confuse the reader why it is done in a particular way.  This is the reason I told you to be consistent.
 
>
> *
> +void
> +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
> +{
> ..
> + /* Open table */
> + onerel = heap_open(lvshared->relid, ShareUpdateExclusiveLock);
> ..
> }
>
> I don't think it is a good idea to assume the lock mode as
> ShareUpdateExclusiveLock here.  Tomorrow, if due to some reason there
> is a change in lock level for the vacuum process, we might forget to
> update it here.  I think it is better if we can get this information
> from the master backend.

So did you mean to declare the lock mode for lazy vacuum somewhere as
a global variable and use it in both try_relation_open in the leader
process and relation_open in the worker process? Otherwise we would
end up with adding something like shared->lmode =
ShareUpdateExclusiveLock during parallel context initialization, which
seems not to resolve your concern.


I was thinking that if we can find a way to pass the lockmode we used in vacuum_rel, but I guess we need to pass it through multiple functions which will be a bit inconvenient.  OTOH, today, I checked nbtsort.c (_bt_parallel_build_main) and found that there also we are using it directly instead of passing it from the master backend.  I think we can leave it as you have in the patch, but add a comment on why it is okay to use that lock mode?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Fri, Oct 4, 2019 at 10:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Thu, Oct 3, 2019 at 9:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> + else
> + {
> + if (for_cleanup)
> + {
> + if (lps->nworkers_requested > 0)
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker for index cleanup
> (planned: %d, requested %d)",
> +   "launched %d parallel vacuum workers for index cleanup (planned:
> %d, requsted %d)",
> +   lps->pcxt->nworkers_launched),
> + lps->pcxt->nworkers_launched,
> + lps->pcxt->nworkers,
> + lps->nworkers_requested);
> + else
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
> +   "launched %d parallel vacuum workers for index cleanup (planned: %d)",
> +   lps->pcxt->nworkers_launched),
> + lps->pcxt->nworkers_launched,
> + lps->pcxt->nworkers);
> + }
> + else
> + {
> + if (lps->nworkers_requested > 0)
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker for index vacuuming
> (planned: %d, requested %d)",
> +   "launched %d parallel vacuum workers for index vacuuming (planned:
> %d, requested %d)",
> +   lps->pcxt->nworkers_launched),
> + lps->pcxt->nworkers_launched,
> + lps->pcxt->nworkers,
> + lps->nworkers_requested);
> + else
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker for index vacuuming
> (planned: %d)",
> +   "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
> +   lps->pcxt->nworkers_launched),
> + lps->pcxt->nworkers_launched,
> + lps->pcxt->nworkers);
> + }
>
> Multiple places I see a lot of duplicate code for for_cleanup is true
> or false.  The only difference is in the error message whether we give
> index cleanup or index vacuuming otherwise complete code is the same
> for
> both the cases.  Can't we create some string and based on the value of
> the for_cleanup and append it in the error message that way we can
> avoid duplicating this at many places?

I think it's necessary for translation. IIUC if we construct the
message it cannot be translated.


Do we really need to log all those messages?  The other places where we launch parallel workers doesn't seem to be using such messages.  Why do you think it is important to log the messages here when other cases don't use it?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Fri, Oct 4, 2019 at 11:01 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Oct 4, 2019 at 10:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
Some more comments..
1.
+ for (idx = 0; idx < nindexes; idx++)
+ {
+ if (!for_cleanup)
+ lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples,
+   vacrelstats->old_live_tuples);
+ else
+ {
+ /* Cleanup one index and update index statistics */
+ lazy_cleanup_index(Irel[idx], &stats[idx], vacrelstats->new_rel_tuples,
+    vacrelstats->tupcount_pages < vacrelstats->rel_pages);
+
+ lazy_update_index_statistics(Irel[idx], stats[idx]);
+
+ if (stats[idx])
+ pfree(stats[idx]);
+ }

I think instead of checking for_cleanup variable for every index of
the loop we better move loop inside, like shown below?

if (!for_cleanup)
for (idx = 0; idx < nindexes; idx++)
lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples,
else
for (idx = 0; idx < nindexes; idx++)
{
lazy_cleanup_index
lazy_update_index_statistics
...
}

2.
+static void
+lazy_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats, Relation *Irel,
+    int nindexes, IndexBulkDeleteResult **stats,
+    LVParallelState *lps, bool for_cleanup)
+{
+ int idx;
+
+ Assert(!IsParallelWorker());
+
+ /* no job if the table has no index */
+ if (nindexes <= 0)
+ return;

Wouldn't it be good idea to call this function only if nindexes > 0?

3.
+/*
+ * Vacuum or cleanup indexes with parallel workers. This function must be used
+ * by the parallel vacuum leader process.
+ */
+static void
+lazy_parallel_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats,
Relation *Irel,
+ int nindexes, IndexBulkDeleteResult **stats,
+ LVParallelState *lps, bool for_cleanup)

If you see this function there is no much common code between
for_cleanup and without for_cleanup except these 3-4 statement.
LaunchParallelWorkers(lps->pcxt);
/* Create the log message to report */
initStringInfo(&buf);
...
/* Wait for all vacuum workers to finish */
WaitForParallelWorkersToFinish(lps->pcxt);

Other than that you have got a lot of checks like this
+ if (!for_cleanup)
+ {
+ }
+ else
+ {
}

I think code would be much redable if we have 2 functions one for
vaccum (lazy_parallel_vacuum_indexes) and another for
cleanup(lazy_parallel_cleanup_indexes).

4.
 * of index scans performed.  So we don't use maintenance_work_mem memory for
  * the TID array, just enough to hold as many heap tuples as fit on one page.
  *
+ * Lazy vacuum supports parallel execution with parallel worker processes. In
+ * parallel lazy vacuum, we perform both index vacuuming and index cleanup with
+ * parallel worker processes. Individual indexes are processed by one vacuum

Spacing after the "." is not uniform, previous comment is using 2
space and newly
added is using 1 space.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Sat, Sep 21, 2019 at 9:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> *
> +end_parallel_vacuum(LVParallelState *lps, Relation *Irel, int nindexes)
> {
> ..
> + /* Shutdown worker processes and destroy the parallel context */
> + WaitForParallelWorkersToFinish(lps->pcxt);
> ..
> }
>
> Do we really need to call WaitForParallelWorkersToFinish here as it
> must have been called in lazy_parallel_vacuum_or_cleanup_indexes
> before this time?

No, removed.

+ /* Shutdown worker processes and destroy the parallel context */
+ DestroyParallelContext(lps->pcxt);

But you forget to update the comment.

Few more comments:
--------------------------------
1.
+/*
+ * Parallel Index vacuuming and index cleanup routine used by both the leader
+ * process and worker processes. Unlike single process vacuum, we don't update
+ * index statistics after cleanup index since it is not allowed during
+ * parallel mode, instead copy index bulk-deletion results from the local
+ * memory to the DSM segment and update them at the end of parallel lazy
+ * vacuum.
+ */
+static void
+do_parallel_vacuum_or_cleanup_indexes(Relation *Irel, int nindexes,
+  IndexBulkDeleteResult **stats,
+  LVShared *lvshared,
+  LVDeadTuples *dead_tuples)
+{
+ /* Loop until all indexes are vacuumed */
+ for (;;)
+ {
+ int idx;
+
+ /* Get an index number to process */
+ idx = pg_atomic_fetch_add_u32(&(lvshared->nprocessed), 1);
+
+ /* Done for all indexes? */
+ if (idx >= nindexes)
+ break;
+
+ /*
+ * Update the pointer to the corresponding bulk-deletion result
+ * if someone has already updated it.
+ */
+ if (lvshared->indstats[idx].updated &&
+ stats[idx] == NULL)
+ stats[idx] = &(lvshared->indstats[idx].stats);
+
+ /* Do vacuum or cleanup one index */
+ if (!lvshared->for_cleanup)
+ lazy_vacuum_index(Irel[idx], &stats[idx], dead_tuples,
+  lvshared->reltuples);
+ else
+ lazy_cleanup_index(Irel[idx], &stats[idx], lvshared->reltuples,
+   lvshared->estimated_count);

It seems we always run index cleanup via parallel worker which seems overkill because the cleanup index generally scans the index only when bulkdelete was not performed.  In some cases like for hash index, it doesn't do anything even bulk delete is not called.  OTOH, for brin index, it does the main job during cleanup but we might be able to always allow index cleanup by parallel worker for brin indexes if we remove the allocation in brinbulkdelete which I am not sure is of any use.

I think we shouldn't call cleanup via parallel worker unless bulkdelete hasn't been performed on the index.

2.
- for (i = 0; i < nindexes; i++)
- lazy_vacuum_index(Irel[i],
-  &indstats[i],
-  vacrelstats);
+ lazy_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
+   indstats, lps, false);

Indentation is not proper.  You might want to run pgindent.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

From
vignesh C
Date:
On Fri, Oct 4, 2019 at 4:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>> On Sat, Sep 21, 2019 at 9:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >
One comment:
We can check if parallel_workers is within range something within
MAX_PARALLEL_WORKER_LIMIT.
+ int parallel_workers = 0;
+
+ if (optarg != NULL)
+ {
+ parallel_workers = atoi(optarg);
+ if (parallel_workers <= 0)
+ {
+ pg_log_error("number of parallel workers must be at least 1");
+ exit(1);
+ }
+ }

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Fri, Oct 4, 2019 at 2:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>> On Sat, Sep 21, 2019 at 9:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>> > *
>> > In function compute_parallel_workers, don't we want to cap the number
>> > of workers based on maintenance_work_mem as we do in
>> > plan_create_index_workers?
>> >
>> > The basic point is how do we want to treat maintenance_work_mem for
>> > this feature.  Do we want all workers to use at max the
>> > maintenance_work_mem or each worker is allowed to use
>> > maintenance_work_mem?  I would prefer earlier unless we have good
>> > reason to follow the later strategy.
>> >
>> > Accordingly, we might need to update the below paragraph in docs:
>> > "Note that parallel utility commands should not consume substantially
>> > more memory than equivalent non-parallel operations.  This strategy
>> > differs from that of parallel query, where resource limits generally
>> > apply per worker process.  Parallel utility commands treat the
>> > resource limit <varname>maintenance_work_mem</varname> as a limit to
>> > be applied to the entire utility command, regardless of the number of
>> > parallel worker processes."
>>
>> I'd also prefer to use maintenance_work_mem at max during parallel
>> vacuum regardless of the number of parallel workers. This is current
>> implementation. In lazy vacuum the maintenance_work_mem is used to
>> record itempointer of dead tuples. This is done by leader process and
>> worker processes just refers them for vacuuming dead index tuples.
>> Even if user sets a small amount of maintenance_work_mem the parallel
>> vacuum would be helpful as it still would take a time for index
>> vacuuming. So I thought we should cap the number of parallel workers
>> by the number of indexes rather than maintenance_work_mem.
>>
>
> Isn't that true only if we never use maintenance_work_mem during index cleanup?  However, I think we are using during
indexcleanup, see forex. ginInsertCleanup.  I think before reaching any conclusion about what to do about this, first
weneed to establish whether this is a problem.  If I am correct, then only some of the index cleanups (like gin index)
usemaintenance_work_mem, so we need to consider that point while designing a solution for this. 
>

I got your point. Currently the single process lazy vacuum could
consume the amount of (maintenance_work_mem * 2) memory at max because
we do index cleanup during holding the dead tuple space as you
mentioned. And ginInsertCleanup is also be called at the beginning of
ginbulkdelete. In current parallel lazy vacuum, each parallel vacuum
worker could consume other memory apart from the memory used by heap
scan depending on the implementation of target index AM. Given that
the current single and parallel vacuum implementation it would be
better to control the amount memory in total rather than the number of
parallel workers. So one approach I came up with is that we make all
vacuum workers use the amount of (maintenance_work_mem / # of
participants) as new maintenance_work_mem. It might be too small in
some cases but it doesn't consume more memory than single lazy vacuum
as long as index AM doesn't consume more memory regardless of
maintenance_work_mem. I think it really depends on the implementation
of index AM.

>>
>> > *
>> > + keys++;
>> > +
>> > + /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
>> > + maxtuples = compute_max_dead_tuples(nblocks, true);
>> > + est_deadtuples = MAXALIGN(add_size(sizeof(LVDeadTuples),
>> > +    mul_size(sizeof(ItemPointerData), maxtuples)));
>> > + shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
>> > + keys++;
>> > +
>> > + shm_toc_estimate_keys(&pcxt->estimator, keys);
>> > +
>> > + /* Finally, estimate VACUUM_KEY_QUERY_TEXT space */
>> > + querylen = strlen(debug_query_string);
>> > + shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
>> > + shm_toc_estimate_keys(&pcxt->estimator, 1);
>> >
>> > The code style looks inconsistent here.  In some cases, you are
>> > calling shm_toc_estimate_keys immediately after shm_toc_estimate_chunk
>> > and in other cases, you are accumulating keys.  I think it is better
>> > to call shm_toc_estimate_keys immediately after shm_toc_estimate_chunk
>> > in all cases.
>>
>> Fixed. But there are some code that call shm_toc_estimate_keys for
>> multiple keys in for example nbtsort.c and parallel.c. What is the
>> difference?
>>
>
> We can do it, either way, depending on the situation.  For example, in nbtsort.c, there is an if check based on which
'numberof keys' can vary.  I think here we should try to write in a way that it should not confuse the reader why it is
donein a particular way.  This is the reason I told you to be consistent. 

Understood. Thank you for explanation!

>
>>
>> >
>> > *
>> > +void
>> > +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
>> > +{
>> > ..
>> > + /* Open table */
>> > + onerel = heap_open(lvshared->relid, ShareUpdateExclusiveLock);
>> > ..
>> > }
>> >
>> > I don't think it is a good idea to assume the lock mode as
>> > ShareUpdateExclusiveLock here.  Tomorrow, if due to some reason there
>> > is a change in lock level for the vacuum process, we might forget to
>> > update it here.  I think it is better if we can get this information
>> > from the master backend.
>>
>> So did you mean to declare the lock mode for lazy vacuum somewhere as
>> a global variable and use it in both try_relation_open in the leader
>> process and relation_open in the worker process? Otherwise we would
>> end up with adding something like shared->lmode =
>> ShareUpdateExclusiveLock during parallel context initialization, which
>> seems not to resolve your concern.
>>
>
> I was thinking that if we can find a way to pass the lockmode we used in vacuum_rel, but I guess we need to pass it
throughmultiple functions which will be a bit inconvenient.  OTOH, today, I checked nbtsort.c (_bt_parallel_build_main)
andfound that there also we are using it directly instead of passing it from the master backend.  I think we can leave
itas you have in the patch, but add a comment on why it is okay to use that lock mode? 

Yeah agreed.

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Fri, Oct 4, 2019 at 2:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Oct 4, 2019 at 10:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>> On Thu, Oct 3, 2019 at 9:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>> >
>> > On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> >
>> > + else
>> > + {
>> > + if (for_cleanup)
>> > + {
>> > + if (lps->nworkers_requested > 0)
>> > + appendStringInfo(&buf,
>> > + ngettext("launched %d parallel vacuum worker for index cleanup
>> > (planned: %d, requested %d)",
>> > +   "launched %d parallel vacuum workers for index cleanup (planned:
>> > %d, requsted %d)",
>> > +   lps->pcxt->nworkers_launched),
>> > + lps->pcxt->nworkers_launched,
>> > + lps->pcxt->nworkers,
>> > + lps->nworkers_requested);
>> > + else
>> > + appendStringInfo(&buf,
>> > + ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
>> > +   "launched %d parallel vacuum workers for index cleanup (planned: %d)",
>> > +   lps->pcxt->nworkers_launched),
>> > + lps->pcxt->nworkers_launched,
>> > + lps->pcxt->nworkers);
>> > + }
>> > + else
>> > + {
>> > + if (lps->nworkers_requested > 0)
>> > + appendStringInfo(&buf,
>> > + ngettext("launched %d parallel vacuum worker for index vacuuming
>> > (planned: %d, requested %d)",
>> > +   "launched %d parallel vacuum workers for index vacuuming (planned:
>> > %d, requested %d)",
>> > +   lps->pcxt->nworkers_launched),
>> > + lps->pcxt->nworkers_launched,
>> > + lps->pcxt->nworkers,
>> > + lps->nworkers_requested);
>> > + else
>> > + appendStringInfo(&buf,
>> > + ngettext("launched %d parallel vacuum worker for index vacuuming
>> > (planned: %d)",
>> > +   "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
>> > +   lps->pcxt->nworkers_launched),
>> > + lps->pcxt->nworkers_launched,
>> > + lps->pcxt->nworkers);
>> > + }
>> >
>> > Multiple places I see a lot of duplicate code for for_cleanup is true
>> > or false.  The only difference is in the error message whether we give
>> > index cleanup or index vacuuming otherwise complete code is the same
>> > for
>> > both the cases.  Can't we create some string and based on the value of
>> > the for_cleanup and append it in the error message that way we can
>> > avoid duplicating this at many places?
>>
>> I think it's necessary for translation. IIUC if we construct the
>> message it cannot be translated.
>>
>
> Do we really need to log all those messages?  The other places where we launch parallel workers doesn't seem to be
usingsuch messages.  Why do you think it is important to log the messages here when other cases don't use it?
 

Well I would rather think that parallel create index doesn't log
enough messages. Parallel maintenance operation is invoked manually by
user. I can imagine that DBA wants to cancel and try the operation
again later if enough workers are not launched. But there is not a
convenient way to confirm how many parallel workers planned and
actually launched. We need to see ps command or pg_stat_activity.
That's why I think that log message would be helpful for users.

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Fri, Oct 4, 2019 at 3:35 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Oct 4, 2019 at 11:01 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Oct 4, 2019 at 10:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >>
> Some more comments..
> 1.
> + for (idx = 0; idx < nindexes; idx++)
> + {
> + if (!for_cleanup)
> + lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples,
> +   vacrelstats->old_live_tuples);
> + else
> + {
> + /* Cleanup one index and update index statistics */
> + lazy_cleanup_index(Irel[idx], &stats[idx], vacrelstats->new_rel_tuples,
> +    vacrelstats->tupcount_pages < vacrelstats->rel_pages);
> +
> + lazy_update_index_statistics(Irel[idx], stats[idx]);
> +
> + if (stats[idx])
> + pfree(stats[idx]);
> + }
>
> I think instead of checking for_cleanup variable for every index of
> the loop we better move loop inside, like shown below?
>
> if (!for_cleanup)
> for (idx = 0; idx < nindexes; idx++)
> lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples,
> else
> for (idx = 0; idx < nindexes; idx++)
> {
> lazy_cleanup_index
> lazy_update_index_statistics
> ...
> }
>
> 2.
> +static void
> +lazy_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats, Relation *Irel,
> +    int nindexes, IndexBulkDeleteResult **stats,
> +    LVParallelState *lps, bool for_cleanup)
> +{
> + int idx;
> +
> + Assert(!IsParallelWorker());
> +
> + /* no job if the table has no index */
> + if (nindexes <= 0)
> + return;
>
> Wouldn't it be good idea to call this function only if nindexes > 0?
>
> 3.
> +/*
> + * Vacuum or cleanup indexes with parallel workers. This function must be used
> + * by the parallel vacuum leader process.
> + */
> +static void
> +lazy_parallel_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats,
> Relation *Irel,
> + int nindexes, IndexBulkDeleteResult **stats,
> + LVParallelState *lps, bool for_cleanup)
>
> If you see this function there is no much common code between
> for_cleanup and without for_cleanup except these 3-4 statement.
> LaunchParallelWorkers(lps->pcxt);
> /* Create the log message to report */
> initStringInfo(&buf);
> ...
> /* Wait for all vacuum workers to finish */
> WaitForParallelWorkersToFinish(lps->pcxt);
>
> Other than that you have got a lot of checks like this
> + if (!for_cleanup)
> + {
> + }
> + else
> + {
> }
>
> I think code would be much redable if we have 2 functions one for
> vaccum (lazy_parallel_vacuum_indexes) and another for
> cleanup(lazy_parallel_cleanup_indexes).
>
> 4.
>  * of index scans performed.  So we don't use maintenance_work_mem memory for
>   * the TID array, just enough to hold as many heap tuples as fit on one page.
>   *
> + * Lazy vacuum supports parallel execution with parallel worker processes. In
> + * parallel lazy vacuum, we perform both index vacuuming and index cleanup with
> + * parallel worker processes. Individual indexes are processed by one vacuum
>
> Spacing after the "." is not uniform, previous comment is using 2
> space and newly
> added is using 1 space.

Few more comments
----------------------------

1.
+static int
+compute_parallel_workers(Relation onerel, int nrequested, int nindexes)
+{
+ int parallel_workers;
+ bool leaderparticipates = true;

Seems like this function is not using onerel parameter so we can remove this.


2.
+
+ /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
+ maxtuples = compute_max_dead_tuples(nblocks, true);
+ est_deadtuples = MAXALIGN(add_size(SizeOfLVDeadTuples,
+    mul_size(sizeof(ItemPointerData), maxtuples)));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Finally, estimate VACUUM_KEY_QUERY_TEXT space */
+ querylen = strlen(debug_query_string);

for consistency with other comments change
VACUUM_KEY_QUERY_TEXT  to PARALLEL_VACUUM_KEY_QUERY_TEXT


3.
@@ -2888,6 +2888,8 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
  (!wraparound ? VACOPT_SKIP_LOCKED : 0);
  tab->at_params.index_cleanup = VACOPT_TERNARY_DEFAULT;
  tab->at_params.truncate = VACOPT_TERNARY_DEFAULT;
+ /* parallel lazy vacuum is not supported for autovacuum */
+ tab->at_params.nworkers = -1;

What is the reason for the same?  Can we explain in the comments?


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Fri, Oct 4, 2019 at 7:57 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Oct 4, 2019 at 2:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>
> Do we really need to log all those messages?  The other places where we launch parallel workers doesn't seem to be using such messages.  Why do you think it is important to log the messages here when other cases don't use it?

Well I would rather think that parallel create index doesn't log
enough messages. Parallel maintenance operation is invoked manually by
user. I can imagine that DBA wants to cancel and try the operation
again later if enough workers are not launched. But there is not a
convenient way to confirm how many parallel workers planned and
actually launched. We need to see ps command or pg_stat_activity.
That's why I think that log message would be helpful for users.

Hmm, what is a guarantee at a later time the user will get the required number of workers?  I think if the user decides to vacuum, then she would want it to start sooner.  Also, to cancel the vacuum, for this reason, the user needs to monitor logs which don't seem to be an easy thing considering this information will be logged at DEBUG2 level.  I think it is better to add in docs that we don't guarantee that the number of workers the user has asked or expected to use for a parallel vacuum will be available during execution.  Even if there is a compelling reason (which I don't see)  to log this information, I think we shouldn't use more than one message to log (like there is no need for a separate message for cleanup and vacuuming) this information.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Fri, Oct 4, 2019 at 7:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Oct 4, 2019 at 2:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> I'd also prefer to use maintenance_work_mem at max during parallel
>> vacuum regardless of the number of parallel workers. This is current
>> implementation. In lazy vacuum the maintenance_work_mem is used to
>> record itempointer of dead tuples. This is done by leader process and
>> worker processes just refers them for vacuuming dead index tuples.
>> Even if user sets a small amount of maintenance_work_mem the parallel
>> vacuum would be helpful as it still would take a time for index
>> vacuuming. So I thought we should cap the number of parallel workers
>> by the number of indexes rather than maintenance_work_mem.
>>
>
> Isn't that true only if we never use maintenance_work_mem during index cleanup?  However, I think we are using during index cleanup, see forex. ginInsertCleanup.  I think before reaching any conclusion about what to do about this, first we need to establish whether this is a problem.  If I am correct, then only some of the index cleanups (like gin index) use maintenance_work_mem, so we need to consider that point while designing a solution for this.
>

I got your point. Currently the single process lazy vacuum could
consume the amount of (maintenance_work_mem * 2) memory at max because
we do index cleanup during holding the dead tuple space as you
mentioned. And ginInsertCleanup is also be called at the beginning of
ginbulkdelete. In current parallel lazy vacuum, each parallel vacuum
worker could consume other memory apart from the memory used by heap
scan depending on the implementation of target index AM. Given that
the current single and parallel vacuum implementation it would be
better to control the amount memory in total rather than the number of
parallel workers. So one approach I came up with is that we make all
vacuum workers use the amount of (maintenance_work_mem / # of
participants) as new maintenance_work_mem.

Yeah, we can do something like that, but I am not clear whether the current memory usage for Gin indexes is correct.  I have started a new thread, let's discuss there.


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Sun, Oct 6, 2019 at 7:59 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Oct 4, 2019 at 7:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>> On Fri, Oct 4, 2019 at 2:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >>
>> >> I'd also prefer to use maintenance_work_mem at max during parallel
>> >> vacuum regardless of the number of parallel workers. This is current
>> >> implementation. In lazy vacuum the maintenance_work_mem is used to
>> >> record itempointer of dead tuples. This is done by leader process and
>> >> worker processes just refers them for vacuuming dead index tuples.
>> >> Even if user sets a small amount of maintenance_work_mem the parallel
>> >> vacuum would be helpful as it still would take a time for index
>> >> vacuuming. So I thought we should cap the number of parallel workers
>> >> by the number of indexes rather than maintenance_work_mem.
>> >>
>> >
>> > Isn't that true only if we never use maintenance_work_mem during index cleanup?  However, I think we are using
duringindex cleanup, see forex. ginInsertCleanup.  I think before reaching any conclusion about what to do about this,
firstwe need to establish whether this is a problem.  If I am correct, then only some of the index cleanups (like gin
index)use maintenance_work_mem, so we need to consider that point while designing a solution for this. 
>> >
>>
>> I got your point. Currently the single process lazy vacuum could
>> consume the amount of (maintenance_work_mem * 2) memory at max because
>> we do index cleanup during holding the dead tuple space as you
>> mentioned. And ginInsertCleanup is also be called at the beginning of
>> ginbulkdelete. In current parallel lazy vacuum, each parallel vacuum
>> worker could consume other memory apart from the memory used by heap
>> scan depending on the implementation of target index AM. Given that
>> the current single and parallel vacuum implementation it would be
>> better to control the amount memory in total rather than the number of
>> parallel workers. So one approach I came up with is that we make all
>> vacuum workers use the amount of (maintenance_work_mem / # of
>> participants) as new maintenance_work_mem.
>
>
> Yeah, we can do something like that, but I am not clear whether the current memory usage for Gin indexes is correct.
Ihave started a new thread, let's discuss there. 
>

Thank you for starting that discussion!

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Sat, Oct 5, 2019 at 8:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Oct 4, 2019 at 7:57 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>> On Fri, Oct 4, 2019 at 2:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >>
>> >
>> > Do we really need to log all those messages?  The other places where we launch parallel workers doesn't seem to be
usingsuch messages.  Why do you think it is important to log the messages here when other cases don't use it? 
>>
>> Well I would rather think that parallel create index doesn't log
>> enough messages. Parallel maintenance operation is invoked manually by
>> user. I can imagine that DBA wants to cancel and try the operation
>> again later if enough workers are not launched. But there is not a
>> convenient way to confirm how many parallel workers planned and
>> actually launched. We need to see ps command or pg_stat_activity.
>> That's why I think that log message would be helpful for users.
>
>
> Hmm, what is a guarantee at a later time the user will get the required number of workers?  I think if the user
decidesto vacuum, then she would want it to start sooner.  Also, to cancel the vacuum, for this reason, the user needs
tomonitor logs which don't seem to be an easy thing considering this information will be logged at DEBUG2 level.  I
thinkit is better to add in docs that we don't guarantee that the number of workers the user has asked or expected to
usefor a parallel vacuum will be available during execution.  Even if there is a compelling reason (which I don't see)
tolog this information, I think we shouldn't use more than one message to log (like there is no need for a separate
messagefor cleanup and vacuuming) this information. 
>

I think that there is use case where user wants to cancel a
long-running analytic query using parallel workers to use parallel
workers for parallel vacuum instead. That way the lazy vacuum will
eventually complete soon. Or user would want to see the vacuum log to
check if lazy vacuum has been done with how many parallel workers for
diagnostic when the vacuum took a long time. This log information
appears when VERBOSE option is specified. When executing VACUUM
command it's quite common to specify VERBOSE option to see the vacuum
execution more details and VACUUM VERBOSE already emits very detailed
information such as how many frozen pages are skipped and OldestXmin.
So I think this information would not be too odd for that. Are you
concerned that this information takes many lines of code? or it's not
worth to be logged?

I agreed to add in docs that we don't guarantee that the number of
workers user requested will be available.

--
Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Mon, Oct 7, 2019 at 10:00 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Sat, Oct 5, 2019 at 8:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Oct 4, 2019 at 7:57 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>> On Fri, Oct 4, 2019 at 2:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >>
>> >
>> > Do we really need to log all those messages?  The other places where we launch parallel workers doesn't seem to be using such messages.  Why do you think it is important to log the messages here when other cases don't use it?
>>
>> Well I would rather think that parallel create index doesn't log
>> enough messages. Parallel maintenance operation is invoked manually by
>> user. I can imagine that DBA wants to cancel and try the operation
>> again later if enough workers are not launched. But there is not a
>> convenient way to confirm how many parallel workers planned and
>> actually launched. We need to see ps command or pg_stat_activity.
>> That's why I think that log message would be helpful for users.
>
>
> Hmm, what is a guarantee at a later time the user will get the required number of workers?  I think if the user decides to vacuum, then she would want it to start sooner.  Also, to cancel the vacuum, for this reason, the user needs to monitor logs which don't seem to be an easy thing considering this information will be logged at DEBUG2 level.  I think it is better to add in docs that we don't guarantee that the number of workers the user has asked or expected to use for a parallel vacuum will be available during execution.  Even if there is a compelling reason (which I don't see)  to log this information, I think we shouldn't use more than one message to log (like there is no need for a separate message for cleanup and vacuuming) this information.
>

I think that there is use case where user wants to cancel a
long-running analytic query using parallel workers to use parallel
workers for parallel vacuum instead. That way the lazy vacuum will
eventually complete soon. Or user would want to see the vacuum log to
check if lazy vacuum has been done with how many parallel workers for
diagnostic when the vacuum took a long time. This log information
appears when VERBOSE option is specified. When executing VACUUM
command it's quite common to specify VERBOSE option to see the vacuum
execution more details and VACUUM VERBOSE already emits very detailed
information such as how many frozen pages are skipped and OldestXmin.
So I think this information would not be too odd for that. Are you
concerned that this information takes many lines of code? or it's not
worth to be logged?

To an extent both, but I see the point you are making.  So, we should try to minimize the number of lines used to log this message.  If we can use just one message to log this information, that would be ideal.
 

I agreed to add in docs that we don't guarantee that the number of
workers user requested will be available.

Okay.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Fri, Oct 4, 2019 at 7:05 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Oct 4, 2019 at 11:01 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Oct 4, 2019 at 10:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >>
> Some more comments..

Thank you!

> 1.
> + for (idx = 0; idx < nindexes; idx++)
> + {
> + if (!for_cleanup)
> + lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples,
> +   vacrelstats->old_live_tuples);
> + else
> + {
> + /* Cleanup one index and update index statistics */
> + lazy_cleanup_index(Irel[idx], &stats[idx], vacrelstats->new_rel_tuples,
> +    vacrelstats->tupcount_pages < vacrelstats->rel_pages);
> +
> + lazy_update_index_statistics(Irel[idx], stats[idx]);
> +
> + if (stats[idx])
> + pfree(stats[idx]);
> + }
>
> I think instead of checking for_cleanup variable for every index of
> the loop we better move loop inside, like shown below?

Fixed.

>
> if (!for_cleanup)
> for (idx = 0; idx < nindexes; idx++)
> lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples,
> else
> for (idx = 0; idx < nindexes; idx++)
> {
> lazy_cleanup_index
> lazy_update_index_statistics
> ...
> }
>
> 2.
> +static void
> +lazy_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats, Relation *Irel,
> +    int nindexes, IndexBulkDeleteResult **stats,
> +    LVParallelState *lps, bool for_cleanup)
> +{
> + int idx;
> +
> + Assert(!IsParallelWorker());
> +
> + /* no job if the table has no index */
> + if (nindexes <= 0)
> + return;
>
> Wouldn't it be good idea to call this function only if nindexes > 0?
>

I realized the callers of this function should pass nindexes > 0
because they attempt to do index vacuuming or index cleanup. So it
should be an assertion rather than returning. Thoughts?

> 3.
> +/*
> + * Vacuum or cleanup indexes with parallel workers. This function must be used
> + * by the parallel vacuum leader process.
> + */
> +static void
> +lazy_parallel_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats,
> Relation *Irel,
> + int nindexes, IndexBulkDeleteResult **stats,
> + LVParallelState *lps, bool for_cleanup)
>
> If you see this function there is no much common code between
> for_cleanup and without for_cleanup except these 3-4 statement.
> LaunchParallelWorkers(lps->pcxt);
> /* Create the log message to report */
> initStringInfo(&buf);
> ...
> /* Wait for all vacuum workers to finish */
> WaitForParallelWorkersToFinish(lps->pcxt);
>
> Other than that you have got a lot of checks like this
> + if (!for_cleanup)
> + {
> + }
> + else
> + {
> }
>
> I think code would be much redable if we have 2 functions one for
> vaccum (lazy_parallel_vacuum_indexes) and another for
> cleanup(lazy_parallel_cleanup_indexes).

Seems good idea. Fixed.

>
> 4.
>  * of index scans performed.  So we don't use maintenance_work_mem memory for
>   * the TID array, just enough to hold as many heap tuples as fit on one page.
>   *
> + * Lazy vacuum supports parallel execution with parallel worker processes. In
> + * parallel lazy vacuum, we perform both index vacuuming and index cleanup with
> + * parallel worker processes. Individual indexes are processed by one vacuum
>
> Spacing after the "." is not uniform, previous comment is using 2
> space and newly
> added is using 1 space.
>

FIxed.

The code has been fixed in my local repository. After incorporated the
all comments I got so far I'll submit the updated version patch.

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Sat, Oct 5, 2019 at 4:36 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> Few more comments
> ----------------------------
>
> 1.
> +static int
> +compute_parallel_workers(Relation onerel, int nrequested, int nindexes)
> +{
> + int parallel_workers;
> + bool leaderparticipates = true;
>
> Seems like this function is not using onerel parameter so we can remove this.
>

Fixed.

>
> 2.
> +
> + /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
> + maxtuples = compute_max_dead_tuples(nblocks, true);
> + est_deadtuples = MAXALIGN(add_size(SizeOfLVDeadTuples,
> +    mul_size(sizeof(ItemPointerData), maxtuples)));
> + shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
> + shm_toc_estimate_keys(&pcxt->estimator, 1);
> +
> + /* Finally, estimate VACUUM_KEY_QUERY_TEXT space */
> + querylen = strlen(debug_query_string);
>
> for consistency with other comments change
> VACUUM_KEY_QUERY_TEXT  to PARALLEL_VACUUM_KEY_QUERY_TEXT
>

Fixed.

>
> 3.
> @@ -2888,6 +2888,8 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
>   (!wraparound ? VACOPT_SKIP_LOCKED : 0);
>   tab->at_params.index_cleanup = VACOPT_TERNARY_DEFAULT;
>   tab->at_params.truncate = VACOPT_TERNARY_DEFAULT;
> + /* parallel lazy vacuum is not supported for autovacuum */
> + tab->at_params.nworkers = -1;
>
> What is the reason for the same?  Can we explain in the comments?

I think it's just that we don't want to support parallel auto vacuum
because it can consume more CPU resources in spite of background job,
which might be an unexpected behavior of autovacuum. I've changed the
comment.

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Fri, Oct 4, 2019 at 8:55 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Fri, Oct 4, 2019 at 4:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >>
> >> On Sat, Sep 21, 2019 at 9:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >> >
> One comment:

Thank you for reviewing this patch.

> We can check if parallel_workers is within range something within
> MAX_PARALLEL_WORKER_LIMIT.
> + int parallel_workers = 0;
> +
> + if (optarg != NULL)
> + {
> + parallel_workers = atoi(optarg);
> + if (parallel_workers <= 0)
> + {
> + pg_log_error("number of parallel workers must be at least 1");
> + exit(1);
> + }
> + }
>

Fixed.

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Wed, Oct 9, 2019 at 6:13 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sat, Oct 5, 2019 at 4:36 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > 3.
> > @@ -2888,6 +2888,8 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
> >   (!wraparound ? VACOPT_SKIP_LOCKED : 0);
> >   tab->at_params.index_cleanup = VACOPT_TERNARY_DEFAULT;
> >   tab->at_params.truncate = VACOPT_TERNARY_DEFAULT;
> > + /* parallel lazy vacuum is not supported for autovacuum */
> > + tab->at_params.nworkers = -1;
> >
> > What is the reason for the same?  Can we explain in the comments?
>
> I think it's just that we don't want to support parallel auto vacuum
> because it can consume more CPU resources in spite of background job,
> which might be an unexpected behavior of autovacuum.
>

I think the other reason is it can generate a lot of I/O which might
choke other operations.  I think if we want we can provide Guc(s) to
control such behavior, but initially providing it via command should
be a good start so that users can knowingly use it in appropriate
cases.  We can later extend it for autovacuum if required.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Fri, Oct 4, 2019 at 4:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>

Few more comments:
---------------------------------
1.  Caurrently parallel vacuum is allowed for temporary relations
which is wrong.  It leads to below error:

postgres=# create temporary table tmp_t1(c1 int, c2 char(10));
CREATE TABLE
postgres=# create index idx_tmp_t1 on tmp_t1(c1);
CREATE INDEX
postgres=# create index idx1_tmp_t1 on tmp_t1(c2);
CREATE INDEX
postgres=# insert into tmp_t1 values(generate_series(1,10000),'aaaa');
INSERT 0 10000
postgres=# delete from tmp_t1 where c1 > 5000;
DELETE 5000
postgres=# vacuum (parallel 2) tmp_t1;
ERROR:  cannot access temporary tables during a parallel operation
CONTEXT:  parallel worker

The parallel vacuum shouldn't be allowed for temporary relations.

2.
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -34,6 +34,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [
<replaceable class="paramet
     SKIP_LOCKED [ <replaceable class="parameter">boolean</replaceable> ]
     INDEX_CLEANUP [ <replaceable
class="parameter">boolean</replaceable> ]
     TRUNCATE [ <replaceable class="parameter">boolean</replaceable> ]
+    PARALLEL [ <replaceable
class="parameter">integer</replaceable> ]

Now, if the user gives a command like Vacuum (analyze, parallel)
<table_name>; it is not very obvious that a parallel option will be
only used for vacuum purposes but not for analyze.  I think we can add
a note in the docs to mention this explicitly.  This can avoid any
confusion.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Fri, Oct 4, 2019 at 7:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>> On Sat, Sep 21, 2019 at 9:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >
>> > *
>> > +end_parallel_vacuum(LVParallelState *lps, Relation *Irel, int nindexes)
>> > {
>> > ..
>> > + /* Shutdown worker processes and destroy the parallel context */
>> > + WaitForParallelWorkersToFinish(lps->pcxt);
>> > ..
>> > }
>> >
>> > Do we really need to call WaitForParallelWorkersToFinish here as it
>> > must have been called in lazy_parallel_vacuum_or_cleanup_indexes
>> > before this time?
>>
>> No, removed.
>
>
> + /* Shutdown worker processes and destroy the parallel context */
> + DestroyParallelContext(lps->pcxt);
>
> But you forget to update the comment.

Fixed.

>
> Few more comments:
> --------------------------------
> 1.
> +/*
> + * Parallel Index vacuuming and index cleanup routine used by both the leader
> + * process and worker processes. Unlike single process vacuum, we don't update
> + * index statistics after cleanup index since it is not allowed during
> + * parallel mode, instead copy index bulk-deletion results from the local
> + * memory to the DSM segment and update them at the end of parallel lazy
> + * vacuum.
> + */
> +static void
> +do_parallel_vacuum_or_cleanup_indexes(Relation *Irel, int nindexes,
> +  IndexBulkDeleteResult **stats,
> +  LVShared *lvshared,
> +  LVDeadTuples *dead_tuples)
> +{
> + /* Loop until all indexes are vacuumed */
> + for (;;)
> + {
> + int idx;
> +
> + /* Get an index number to process */
> + idx = pg_atomic_fetch_add_u32(&(lvshared->nprocessed), 1);
> +
> + /* Done for all indexes? */
> + if (idx >= nindexes)
> + break;
> +
> + /*
> + * Update the pointer to the corresponding bulk-deletion result
> + * if someone has already updated it.
> + */
> + if (lvshared->indstats[idx].updated &&
> + stats[idx] == NULL)
> + stats[idx] = &(lvshared->indstats[idx].stats);
> +
> + /* Do vacuum or cleanup one index */
> + if (!lvshared->for_cleanup)
> + lazy_vacuum_index(Irel[idx], &stats[idx], dead_tuples,
> +  lvshared->reltuples);
> + else
> + lazy_cleanup_index(Irel[idx], &stats[idx], lvshared->reltuples,
> +   lvshared->estimated_count);
>
> It seems we always run index cleanup via parallel worker which seems overkill because the cleanup index generally
scansthe index only when bulkdelete was not performed.  In some cases like for hash index, it doesn't do anything even
bulkdelete is not called.  OTOH, for brin index, it does the main job during cleanup but we might be able to always
allowindex cleanup by parallel worker for brin indexes if we remove the allocation in brinbulkdelete which I am not
sureis of any use. 
>
> I think we shouldn't call cleanup via parallel worker unless bulkdelete hasn't been performed on the index.
>

Agreed. Fixed.

> 2.
> - for (i = 0; i < nindexes; i++)
> - lazy_vacuum_index(Irel[i],
> -  &indstats[i],
> -  vacrelstats);
> + lazy_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
> +   indstats, lps, false);
>
> Indentation is not proper.  You might want to run pgindent.

Fixed.

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Thu, Oct 10, 2019 at 2:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Oct 4, 2019 at 4:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >>
>
> Few more comments:

Thank you for reviewing the patch!

> ---------------------------------
> 1.  Caurrently parallel vacuum is allowed for temporary relations
> which is wrong.  It leads to below error:
>
> postgres=# create temporary table tmp_t1(c1 int, c2 char(10));
> CREATE TABLE
> postgres=# create index idx_tmp_t1 on tmp_t1(c1);
> CREATE INDEX
> postgres=# create index idx1_tmp_t1 on tmp_t1(c2);
> CREATE INDEX
> postgres=# insert into tmp_t1 values(generate_series(1,10000),'aaaa');
> INSERT 0 10000
> postgres=# delete from tmp_t1 where c1 > 5000;
> DELETE 5000
> postgres=# vacuum (parallel 2) tmp_t1;
> ERROR:  cannot access temporary tables during a parallel operation
> CONTEXT:  parallel worker
>
> The parallel vacuum shouldn't be allowed for temporary relations.

Fixed.

>
> 2.
> --- a/doc/src/sgml/ref/vacuum.sgml
> +++ b/doc/src/sgml/ref/vacuum.sgml
> @@ -34,6 +34,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [
> <replaceable class="paramet
>      SKIP_LOCKED [ <replaceable class="parameter">boolean</replaceable> ]
>      INDEX_CLEANUP [ <replaceable
> class="parameter">boolean</replaceable> ]
>      TRUNCATE [ <replaceable class="parameter">boolean</replaceable> ]
> +    PARALLEL [ <replaceable
> class="parameter">integer</replaceable> ]
>
> Now, if the user gives a command like Vacuum (analyze, parallel)
> <table_name>; it is not very obvious that a parallel option will be
> only used for vacuum purposes but not for analyze.  I think we can add
> a note in the docs to mention this explicitly.  This can avoid any
> confusion.

Agreed.

Attached the latest version patch although the memory usage problem is
under discussion. I'll update the patches according to the result of
that discussion.

Regards,

--
Masahiko Sawada

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh
Date:
Hi

On Thu, 10 Oct 2019 at 13:18, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Thu, Oct 10, 2019 at 2:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Oct 4, 2019 at 4:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >>
>
> Few more comments:

Thank you for reviewing the patch!

> ---------------------------------
> 1.  Caurrently parallel vacuum is allowed for temporary relations
> which is wrong.  It leads to below error:
>
> postgres=# create temporary table tmp_t1(c1 int, c2 char(10));
> CREATE TABLE
> postgres=# create index idx_tmp_t1 on tmp_t1(c1);
> CREATE INDEX
> postgres=# create index idx1_tmp_t1 on tmp_t1(c2);
> CREATE INDEX
> postgres=# insert into tmp_t1 values(generate_series(1,10000),'aaaa');
> INSERT 0 10000
> postgres=# delete from tmp_t1 where c1 > 5000;
> DELETE 5000
> postgres=# vacuum (parallel 2) tmp_t1;
> ERROR:  cannot access temporary tables during a parallel operation
> CONTEXT:  parallel worker
>
> The parallel vacuum shouldn't be allowed for temporary relations.

Fixed.

>
> 2.
> --- a/doc/src/sgml/ref/vacuum.sgml
> +++ b/doc/src/sgml/ref/vacuum.sgml
> @@ -34,6 +34,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [
> <replaceable class="paramet
>      SKIP_LOCKED [ <replaceable class="parameter">boolean</replaceable> ]
>      INDEX_CLEANUP [ <replaceable
> class="parameter">boolean</replaceable> ]
>      TRUNCATE [ <replaceable class="parameter">boolean</replaceable> ]
> +    PARALLEL [ <replaceable
> class="parameter">integer</replaceable> ]
>
> Now, if the user gives a command like Vacuum (analyze, parallel)
> <table_name>; it is not very obvious that a parallel option will be
> only used for vacuum purposes but not for analyze.  I think we can add
> a note in the docs to mention this explicitly.  This can avoid any
> confusion.

Agreed.

Attached the latest version patch although the memory usage problem is
under discussion. I'll update the patches according to the result of
that discussion.

 
I applied both patches on HEAD and did some testing. I am getting one crash in freeing memory. (pfree(stats[i]))

Steps to reproduce:
Step 1) Apply both the patches and configure with below command.
./configure --with-zlib  --enable-debug --prefix=$PWD/inst/   --with-openssl CFLAGS="-ggdb3" > war && make -j 8 install > war

Step 2) Now start the server.

Step 3) Fire below commands:
create table tmp_t1(c1 int, c2 char(10));
create index idx_tmp_t1 on tmp_t1(c1);
create index idx1_tmp_t1 on tmp_t1(c2);
insert into tmp_t1 values(generate_series(1,10000),'aaaa');
insert into tmp_t1 values(generate_series(1,10000),'aaaa');
insert into tmp_t1 values(generate_series(1,10000),'aaaa');
insert into tmp_t1 values(generate_series(1,10000),'aaaa');
insert into tmp_t1 values(generate_series(1,10000),'aaaa');
insert into tmp_t1 values(generate_series(1,10000),'aaaa');
delete from tmp_t1 where c1 > 5000;
vacuum (parallel 2) tmp_t1;

Call stack:
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: mahendra postgres [local] VACUUM                        '.
Program terminated with signal 11, Segmentation fault.
#0  0x0000000000a4f97a in pfree (pointer=0x10baa68) at mcxt.c:1060
1060 context->methods->free_p(context, pointer);
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-19.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64 libselinux-2.5-12.el7.x86_64 openssl-libs-1.0.2k-12.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0  0x0000000000a4f97a in pfree (pointer=0x10baa68) at mcxt.c:1060
#1  0x00000000004e7d13 in update_index_statistics (Irel=0x10b9808, stats=0x10b9828, nindexes=2) at vacuumlazy.c:2277
#2  0x00000000004e693f in lazy_scan_heap (onerel=0x7f8d99610d08, params=0x7ffeeaddb7f0, vacrelstats=0x10b9728, Irel=0x10b9808, nindexes=2, aggressive=false) at vacuumlazy.c:1659
'#3  0x00000000004e4d25 in heap_vacuum_rel (onerel=0x7f8d99610d08, params=0x7ffeeaddb7f0, bstrategy=0x1117528) at vacuumlazy.c:431
#4  0x00000000006a71a7 in table_relation_vacuum (rel=0x7f8d99610d08, params=0x7ffeeaddb7f0, bstrategy=0x1117528) at ../../../src/include/access/tableam.h:1432
#5  0x00000000006a9899 in vacuum_rel (relid=16384, relation=0x103b308, params=0x7ffeeaddb7f0) at vacuum.c:1870
#6  0x00000000006a7c22 in vacuum (relations=0x11176b8, params=0x7ffeeaddb7f0, bstrategy=0x1117528, isTopLevel=true) at vacuum.c:425
#7  0x00000000006a77e6 in ExecVacuum (pstate=0x105f578, vacstmt=0x103b3d8, isTopLevel=true) at vacuum.c:228
#8  0x00000000008af401 in standard_ProcessUtility (pstmt=0x103b6f8, queryString=0x103a808 "vacuum (parallel 2) tmp_t1;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
    dest=0x103b7d8, completionTag=0x7ffeeaddbc50 "") at utility.c:670
#9  0x00000000008aec40 in ProcessUtility (pstmt=0x103b6f8, queryString=0x103a808 "vacuum (parallel 2) tmp_t1;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
    dest=0x103b7d8, completionTag=0x7ffeeaddbc50 "") at utility.c:360
#10 0x00000000008addbb in PortalRunUtility (portal=0x10a1a28, pstmt=0x103b6f8, isTopLevel=true, setHoldSnapshot=false, dest=0x103b7d8, completionTag=0x7ffeeaddbc50 "") at pquery.c:1175
#11 0x00000000008adf9f in PortalRunMulti (portal=0x10a1a28, isTopLevel=true, setHoldSnapshot=false, dest=0x103b7d8, altdest=0x103b7d8, completionTag=0x7ffeeaddbc50 "") at pquery.c:1321
#12 0x00000000008ad55d in PortalRun (portal=0x10a1a28, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x103b7d8, altdest=0x103b7d8, completionTag=0x7ffeeaddbc50 "")
    at pquery.c:796
#13 0x00000000008a7789 in exec_simple_query (query_string=0x103a808 "vacuum (parallel 2) tmp_t1;") at postgres.c:1231
#14 0x00000000008ab8f2 in PostgresMain (argc=1, argv=0x1065b00, dbname=0x1065a28 "postgres", username=0x1065a08 "mahendra") at postgres.c:4256
#15 0x0000000000811a42 in BackendRun (port=0x105d9c0) at postmaster.c:4465
#16 0x0000000000811241 in BackendStartup (port=0x105d9c0) at postmaster.c:4156
#17 0x000000000080d7d6 in ServerLoop () at postmaster.c:1718
#18 0x000000000080d096 in PostmasterMain (argc=3, argv=0x1035270) at postmaster.c:1391
#19 0x000000000072accb in main (argc=3, argv=0x1035270) at main.c:210


I did some analysis and found that we are trying to free some already freed memory. Or we are freeing palloced memory in vac_update_relstats.
    for (i = 0; i < nindexes; i++)
    {
        if (stats[i] == NULL || stats[i]->estimated_count)
            continue;

        /* Update index statistics */
        vac_update_relstats(Irel[i],
                            stats[i]->num_pages,
                            stats[i]->num_index_tuples,
                            0,
                            false,
                            InvalidTransactionId,
                            InvalidMultiXactId,
                            false);
        pfree(stats[i]);
    }

As my table have 2 indexes, so we have to free both stats. When i = 0, it is freeing propery but when i = 1, then vac_update_relstats  is freeing memory.
(gdb) p *stats[i]
$1 = {num_pages = 218, pages_removed = 0, estimated_count = false, num_index_tuples = 30000, tuples_removed = 30000, pages_deleted = 102, pages_free = 0}
(gdb) p *stats[i]
$2 = {num_pages = 0, pages_removed = 65536, estimated_count = false, num_index_tuples = 0, tuples_removed = 0, pages_deleted = 0, pages_free = 0}
(gdb)

From above data, it looks like, somewhere inside vac_update_relstats, we are freeing all palloced memory. I don't know, why is it.

Thanks and Regards
Mahendra Thalor

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Fri, Oct 11, 2019 at 4:47 PM Mahendra Singh <mahi6run@gmail.com> wrote:
>
>
> I did some analysis and found that we are trying to free some already freed memory. Or we are freeing palloced memory
invac_update_relstats.
 
>     for (i = 0; i < nindexes; i++)
>     {
>         if (stats[i] == NULL || stats[i]->estimated_count)
>             continue;
>
>         /* Update index statistics */
>         vac_update_relstats(Irel[i],
>                             stats[i]->num_pages,
>                             stats[i]->num_index_tuples,
>                             0,
>                             false,
>                             InvalidTransactionId,
>                             InvalidMultiXactId,
>                             false);
>         pfree(stats[i]);
>     }
>
> As my table have 2 indexes, so we have to free both stats. When i = 0, it is freeing propery but when i = 1, then
vac_update_relstats is freeing memory.
 
>>
>> (gdb) p *stats[i]
>> $1 = {num_pages = 218, pages_removed = 0, estimated_count = false, num_index_tuples = 30000, tuples_removed = 30000,
pages_deleted= 102, pages_free = 0}
 
>> (gdb) p *stats[i]
>> $2 = {num_pages = 0, pages_removed = 65536, estimated_count = false, num_index_tuples = 0, tuples_removed = 0,
pages_deleted= 0, pages_free = 0}
 
>> (gdb)
>
>
> From above data, it looks like, somewhere inside vac_update_relstats, we are freeing all palloced memory. I don't
know,why is it.
 
>

I don't think the problem is in vac_update_relstats as we are not even
passing stats to it, so it won't be able to free it.  I think the real
problem is in the way we copy the stats from shared memory to local
memory in the function end_parallel_vacuum().  Basically, it allocates
the memory for all the index stats together and then in function
update_index_statistics,  it is trying to free memory of individual
array elements, that won't work.  I have tried to fix the allocation
in end_parallel_vacuum, see if this fixes the problem for you.   You
need to apply the attached patch atop
v28-0001-Add-parallel-option-to-VACUUM-command posted above by
Sawada-San.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Sat, Oct 12, 2019 at 12:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Oct 11, 2019 at 4:47 PM Mahendra Singh <mahi6run@gmail.com> wrote:
> >
> >
> > I did some analysis and found that we are trying to free some already freed memory. Or we are freeing palloced
memoryin vac_update_relstats.
 
> >     for (i = 0; i < nindexes; i++)
> >     {
> >         if (stats[i] == NULL || stats[i]->estimated_count)
> >             continue;
> >
> >         /* Update index statistics */
> >         vac_update_relstats(Irel[i],
> >                             stats[i]->num_pages,
> >                             stats[i]->num_index_tuples,
> >                             0,
> >                             false,
> >                             InvalidTransactionId,
> >                             InvalidMultiXactId,
> >                             false);
> >         pfree(stats[i]);
> >     }
> >
> > As my table have 2 indexes, so we have to free both stats. When i = 0, it is freeing propery but when i = 1, then
vac_update_relstats is freeing memory.
 
> >>
> >> (gdb) p *stats[i]
> >> $1 = {num_pages = 218, pages_removed = 0, estimated_count = false, num_index_tuples = 30000, tuples_removed =
30000,pages_deleted = 102, pages_free = 0}
 
> >> (gdb) p *stats[i]
> >> $2 = {num_pages = 0, pages_removed = 65536, estimated_count = false, num_index_tuples = 0, tuples_removed = 0,
pages_deleted= 0, pages_free = 0}
 
> >> (gdb)
> >
> >
> > From above data, it looks like, somewhere inside vac_update_relstats, we are freeing all palloced memory. I don't
know,why is it.
 
> >
>
> I don't think the problem is in vac_update_relstats as we are not even
> passing stats to it, so it won't be able to free it.  I think the real
> problem is in the way we copy the stats from shared memory to local
> memory in the function end_parallel_vacuum().  Basically, it allocates
> the memory for all the index stats together and then in function
> update_index_statistics,  it is trying to free memory of individual
> array elements, that won't work.  I have tried to fix the allocation
> in end_parallel_vacuum, see if this fixes the problem for you.   You
> need to apply the attached patch atop
> v28-0001-Add-parallel-option-to-VACUUM-command posted above by
> Sawada-San.

Thank you for reviewing and creating the patch!

I think the patch fixes this issue correctly. Attached the updated
version patch.

Regards,

--
Masahiko Sawada

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh
Date:
Thanks Amit for patch.

Crash is fixed by this patch.

Thanks and Regards
Mahendra Thalor


On Sat, Oct 12, 2019, 09:03 Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Oct 11, 2019 at 4:47 PM Mahendra Singh <mahi6run@gmail.com> wrote:
>
>
> I did some analysis and found that we are trying to free some already freed memory. Or we are freeing palloced memory in vac_update_relstats.
>     for (i = 0; i < nindexes; i++)
>     {
>         if (stats[i] == NULL || stats[i]->estimated_count)
>             continue;
>
>         /* Update index statistics */
>         vac_update_relstats(Irel[i],
>                             stats[i]->num_pages,
>                             stats[i]->num_index_tuples,
>                             0,
>                             false,
>                             InvalidTransactionId,
>                             InvalidMultiXactId,
>                             false);
>         pfree(stats[i]);
>     }
>
> As my table have 2 indexes, so we have to free both stats. When i = 0, it is freeing propery but when i = 1, then vac_update_relstats  is freeing memory.
>>
>> (gdb) p *stats[i]
>> $1 = {num_pages = 218, pages_removed = 0, estimated_count = false, num_index_tuples = 30000, tuples_removed = 30000, pages_deleted = 102, pages_free = 0}
>> (gdb) p *stats[i]
>> $2 = {num_pages = 0, pages_removed = 65536, estimated_count = false, num_index_tuples = 0, tuples_removed = 0, pages_deleted = 0, pages_free = 0}
>> (gdb)
>
>
> From above data, it looks like, somewhere inside vac_update_relstats, we are freeing all palloced memory. I don't know, why is it.
>

I don't think the problem is in vac_update_relstats as we are not even
passing stats to it, so it won't be able to free it.  I think the real
problem is in the way we copy the stats from shared memory to local
memory in the function end_parallel_vacuum().  Basically, it allocates
the memory for all the index stats together and then in function
update_index_statistics,  it is trying to free memory of individual
array elements, that won't work.  I have tried to fix the allocation
in end_parallel_vacuum, see if this fixes the problem for you.   You
need to apply the attached patch atop
v28-0001-Add-parallel-option-to-VACUUM-command posted above by
Sawada-San.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Sat, Oct 12, 2019 at 11:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sat, Oct 12, 2019 at 12:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Oct 11, 2019 at 4:47 PM Mahendra Singh <mahi6run@gmail.com> wrote:
> > >
>
> Thank you for reviewing and creating the patch!
>
> I think the patch fixes this issue correctly. Attached the updated
> version patch.
>

I see a much bigger problem with the way this patch collects the index
stats in shared memory.  IIUC, it allocates the shared memory (DSM)
for all the index stats, in the same way, considering its size as
IndexBulkDeleteResult.  For the first time, it gets the stats from
local memory as returned by ambulkdelete/amvacuumcleanup call and then
copies it in shared memory space.  There onwards, it always updates
the stats in shared memory by pointing each index stats to that
memory.  In this scheme, you overlooked the point that an index AM
could choose to return a larger structure of which
IndexBulkDeleteResult is just the first field.  This generally
provides a way for ambulkdelete to communicate additional private data
to amvacuumcleanup.  We use this idea in the gist index, see how
gistbulkdelete and gistvacuumcleanup works. The current design won't
work for such cases.

One idea is to change the design such that each index method provides
a method to estimate/allocate the shared memory required for stats of
ambulkdelete/amvacuumscan and then later we also need to use index
method-specific function which copies the stats from local memory to
shared memory. I think this needs further investigation.

I have also made a few other changes in the attached delta patch.  The
main point that fixed by attached patch is that even if we don't allow
a parallel vacuum on temporary tables, the analyze should be able to
work if the user has asked for it.  I have changed an error message
and few other cosmetic changes related to comments.  Kindly include
this in the next version if you don't find any problem with the
changes.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Sat, Oct 12, 2019 at 4:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Sat, Oct 12, 2019 at 11:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
>
> I see a much bigger problem with the way this patch collects the index
> stats in shared memory.  IIUC, it allocates the shared memory (DSM)
> for all the index stats, in the same way, considering its size as
> IndexBulkDeleteResult.  For the first time, it gets the stats from
> local memory as returned by ambulkdelete/amvacuumcleanup call and then
> copies it in shared memory space.  There onwards, it always updates
> the stats in shared memory by pointing each index stats to that
> memory.  In this scheme, you overlooked the point that an index AM
> could choose to return a larger structure of which
> IndexBulkDeleteResult is just the first field.  This generally
> provides a way for ambulkdelete to communicate additional private data
> to amvacuumcleanup.  We use this idea in the gist index, see how
> gistbulkdelete and gistvacuumcleanup works. The current design won't
> work for such cases.
>

Today, I looked at gistbulkdelete and gistvacuumcleanup closely and I
have a few observations about those which might help us to solve this
problem for gist indexes:
1. Are we using memory context GistBulkDeleteResult->page_set_context?
 It seems to me it is not being used.
2. Each time we perform gistbulkdelete, we always seem to reset the
GistBulkDeleteResult stats, see gistvacuumscan.  So, how will it
accumulate it for the cleanup phase when the vacuum needs to call
gistbulkdelete multiple times because the available space for
dead-tuple is filled.  It seems to me like we only use the stats from
the very last call to gistbulkdelete.
3. Do we really need to give the responsibility of deleting empty
pages (gistvacuum_delete_empty_pages) to gistvacuumcleanup.  Can't we
do it in gistbulkdelte?  I see one advantage of postponing it till the
cleanup phase which is if somehow we can accumulate stats over
multiple calls of gistbulkdelete, but I am not sure if it is feasible.
At least, the way current code works, it seems that there is no
advantage to postpone deleting empty pages till the cleanup phase.

If we avoid postponing deleting empty pages till the cleanup phase,
then we don't have the problem for gist indexes.

This is not directly related to this patch, so we can discuss these
observations in a separate thread as well, but before that, I wanted
to check your opinion to see if this makes sense to you as this will
help us in moving this patch forward.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Mon, Oct 14, 2019 at 3:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Oct 12, 2019 at 4:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Sat, Oct 12, 2019 at 11:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> >
> > I see a much bigger problem with the way this patch collects the index
> > stats in shared memory.  IIUC, it allocates the shared memory (DSM)
> > for all the index stats, in the same way, considering its size as
> > IndexBulkDeleteResult.  For the first time, it gets the stats from
> > local memory as returned by ambulkdelete/amvacuumcleanup call and then
> > copies it in shared memory space.  There onwards, it always updates
> > the stats in shared memory by pointing each index stats to that
> > memory.  In this scheme, you overlooked the point that an index AM
> > could choose to return a larger structure of which
> > IndexBulkDeleteResult is just the first field.  This generally
> > provides a way for ambulkdelete to communicate additional private data
> > to amvacuumcleanup.  We use this idea in the gist index, see how
> > gistbulkdelete and gistvacuumcleanup works. The current design won't
> > work for such cases.
> >
>
> Today, I looked at gistbulkdelete and gistvacuumcleanup closely and I
> have a few observations about those which might help us to solve this
> problem for gist indexes:
> 1. Are we using memory context GistBulkDeleteResult->page_set_context?
>  It seems to me it is not being used.
To me also it appears that it's not being used.

> 2. Each time we perform gistbulkdelete, we always seem to reset the
> GistBulkDeleteResult stats, see gistvacuumscan.  So, how will it
> accumulate it for the cleanup phase when the vacuum needs to call
> gistbulkdelete multiple times because the available space for
> dead-tuple is filled.  It seems to me like we only use the stats from
> the very last call to gistbulkdelete.
IIUC, it is fine to use the stats from the latest gistbulkdelete call
because we are trying to collect the information of the empty pages
while scanning the tree.  So I think it would be fine to just use the
information collected from the latest scan otherwise we will get
duplicate information.

> 3. Do we really need to give the responsibility of deleting empty
> pages (gistvacuum_delete_empty_pages) to gistvacuumcleanup.  Can't we
> do it in gistbulkdelte?  I see one advantage of postponing it till the
> cleanup phase which is if somehow we can accumulate stats over
> multiple calls of gistbulkdelete, but I am not sure if it is feasible.
It seems that we want to use the latest result. That might be the
reason for postponing to the cleanup phase.


> At least, the way current code works, it seems that there is no
> advantage to postpone deleting empty pages till the cleanup phase.
>
> If we avoid postponing deleting empty pages till the cleanup phase,
> then we don't have the problem for gist indexes.
>
> This is not directly related to this patch, so we can discuss these
> observations in a separate thread as well, but before that, I wanted
> to check your opinion to see if this makes sense to you as this will
> help us in moving this patch forward.



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Mon, Oct 14, 2019 at 6:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Oct 12, 2019 at 4:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Sat, Oct 12, 2019 at 11:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> >
> > I see a much bigger problem with the way this patch collects the index
> > stats in shared memory.  IIUC, it allocates the shared memory (DSM)
> > for all the index stats, in the same way, considering its size as
> > IndexBulkDeleteResult.  For the first time, it gets the stats from
> > local memory as returned by ambulkdelete/amvacuumcleanup call and then
> > copies it in shared memory space.  There onwards, it always updates
> > the stats in shared memory by pointing each index stats to that
> > memory.  In this scheme, you overlooked the point that an index AM
> > could choose to return a larger structure of which
> > IndexBulkDeleteResult is just the first field.  This generally
> > provides a way for ambulkdelete to communicate additional private data
> > to amvacuumcleanup.  We use this idea in the gist index, see how
> > gistbulkdelete and gistvacuumcleanup works. The current design won't
> > work for such cases.

Indeed. That's a very good point. Thank you for pointing out.

> >
>
> Today, I looked at gistbulkdelete and gistvacuumcleanup closely and I
> have a few observations about those which might help us to solve this
> problem for gist indexes:
> 1. Are we using memory context GistBulkDeleteResult->page_set_context?
>  It seems to me it is not being used.

Yes I also think this memory context is not being used.

> 2. Each time we perform gistbulkdelete, we always seem to reset the
> GistBulkDeleteResult stats, see gistvacuumscan.  So, how will it
> accumulate it for the cleanup phase when the vacuum needs to call
> gistbulkdelete multiple times because the available space for
> dead-tuple is filled.  It seems to me like we only use the stats from
> the very last call to gistbulkdelete.

I think you're right. gistbulkdelete scans all pages and collects all
internal pages and all empty pages. And then in gistvacuumcleanup it
uses them to unlink all empty pages. Currently it accumulates such
information over multiple gistbulkdelete calls due to missing
switching the memory context but I guess this code intends to use them
only from the very last call to gistbulkdelete.

> 3. Do we really need to give the responsibility of deleting empty
> pages (gistvacuum_delete_empty_pages) to gistvacuumcleanup.  Can't we
> do it in gistbulkdelte?  I see one advantage of postponing it till the
> cleanup phase which is if somehow we can accumulate stats over
> multiple calls of gistbulkdelete, but I am not sure if it is feasible.
> At least, the way current code works, it seems that there is no
> advantage to postpone deleting empty pages till the cleanup phase.
>

Considering the current strategy of page deletion of gist index the
advantage of postponing the page deletion till the cleanup phase is
that we can do the bulk deletion in cleanup phase which is called at
most once. But I wonder if we can do the page deletion in the similar
way to btree index. Or even we use the current strategy I think we can
do that while not passing the pages information from bulkdelete to
vacuumcleanup using by GistBulkDeleteResult.

> If we avoid postponing deleting empty pages till the cleanup phase,
> then we don't have the problem for gist indexes.

Yes. But considering your pointing out I guess that there might be
other index AMs use the stats returned from bulkdelete in the similar
way to gist index (i.e. using more larger structure of which
IndexBulkDeleteResult is just the first field). If we have the same
concern the parallel vacuum still needs to deal with that as you
mentioned.

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Tue, Oct 15, 2019 at 10:34 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Oct 14, 2019 at 6:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
>
> > 3. Do we really need to give the responsibility of deleting empty
> > pages (gistvacuum_delete_empty_pages) to gistvacuumcleanup.  Can't we
> > do it in gistbulkdelte?  I see one advantage of postponing it till the
> > cleanup phase which is if somehow we can accumulate stats over
> > multiple calls of gistbulkdelete, but I am not sure if it is feasible.
> > At least, the way current code works, it seems that there is no
> > advantage to postpone deleting empty pages till the cleanup phase.
> >
>
> Considering the current strategy of page deletion of gist index the
> advantage of postponing the page deletion till the cleanup phase is
> that we can do the bulk deletion in cleanup phase which is called at
> most once. But I wonder if we can do the page deletion in the similar
> way to btree index.
>

I think there might be some advantage of the current strategy due to
which it has been chosen.  I was going through the development thread
and noticed some old email which points something related to this.
See [1].

> Or even we use the current strategy I think we can
> do that while not passing the pages information from bulkdelete to
> vacuumcleanup using by GistBulkDeleteResult.
>

Yeah, I also think so.  I have started a new thread [2] to know the
opinion of others on this matter.

> > If we avoid postponing deleting empty pages till the cleanup phase,
> > then we don't have the problem for gist indexes.
>
> Yes. But considering your pointing out I guess that there might be
> other index AMs use the stats returned from bulkdelete in the similar
> way to gist index (i.e. using more larger structure of which
> IndexBulkDeleteResult is just the first field). If we have the same
> concern the parallel vacuum still needs to deal with that as you
> mentioned.
>

Right, apart from some functions for memory allocation/estimation and
stats copy, we might need something like amcanparallelvacuum, so that
index methods can have the option to not participate in parallel
vacuum due to reasons similar to gist or something else.  I think we
can work towards this direction as this anyway seems to be required
and till we reach any conclusion for gist indexes, you can mark
amcanparallelvacuum for gist indexes as false.

[1] - https://www.postgresql.org/message-id/8548498B-6EC6-4C89-8313-107BEC437489%40yandex-team.ru
[2] - https://www.postgresql.org/message-id/CAA4eK1LGr%2BMN0xHZpJ2dfS8QNQ1a_aROKowZB%2BMPNep8FVtwAA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, Oct 15, 2019 at 3:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Oct 15, 2019 at 10:34 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Oct 14, 2019 at 6:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> >
> > > 3. Do we really need to give the responsibility of deleting empty
> > > pages (gistvacuum_delete_empty_pages) to gistvacuumcleanup.  Can't we
> > > do it in gistbulkdelte?  I see one advantage of postponing it till the
> > > cleanup phase which is if somehow we can accumulate stats over
> > > multiple calls of gistbulkdelete, but I am not sure if it is feasible.
> > > At least, the way current code works, it seems that there is no
> > > advantage to postpone deleting empty pages till the cleanup phase.
> > >
> >
> > Considering the current strategy of page deletion of gist index the
> > advantage of postponing the page deletion till the cleanup phase is
> > that we can do the bulk deletion in cleanup phase which is called at
> > most once. But I wonder if we can do the page deletion in the similar
> > way to btree index.
> >
>
> I think there might be some advantage of the current strategy due to
> which it has been chosen.  I was going through the development thread
> and noticed some old email which points something related to this.
> See [1].

Thanks.

>
> > Or even we use the current strategy I think we can
> > do that while not passing the pages information from bulkdelete to
> > vacuumcleanup using by GistBulkDeleteResult.
> >
>
> Yeah, I also think so.  I have started a new thread [2] to know the
> opinion of others on this matter.
>

Thank you.

> > > If we avoid postponing deleting empty pages till the cleanup phase,
> > > then we don't have the problem for gist indexes.
> >
> > Yes. But considering your pointing out I guess that there might be
> > other index AMs use the stats returned from bulkdelete in the similar
> > way to gist index (i.e. using more larger structure of which
> > IndexBulkDeleteResult is just the first field). If we have the same
> > concern the parallel vacuum still needs to deal with that as you
> > mentioned.
> >
>
> Right, apart from some functions for memory allocation/estimation and
> stats copy, we might need something like amcanparallelvacuum, so that
> index methods can have the option to not participate in parallel
> vacuum due to reasons similar to gist or something else.  I think we
> can work towards this direction as this anyway seems to be required
> and till we reach any conclusion for gist indexes, you can mark
> amcanparallelvacuum for gist indexes as false.

Agreed. I'll create a separate patch to add this callback and change
parallel vacuum patch so that it checks the participation of indexes
and then vacuums on un-participated indexes after parallel vacuum.

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, Oct 15, 2019 at 4:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Oct 15, 2019 at 3:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Oct 15, 2019 at 10:34 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Mon, Oct 14, 2019 at 6:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > >
> > > > 3. Do we really need to give the responsibility of deleting empty
> > > > pages (gistvacuum_delete_empty_pages) to gistvacuumcleanup.  Can't we
> > > > do it in gistbulkdelte?  I see one advantage of postponing it till the
> > > > cleanup phase which is if somehow we can accumulate stats over
> > > > multiple calls of gistbulkdelete, but I am not sure if it is feasible.
> > > > At least, the way current code works, it seems that there is no
> > > > advantage to postpone deleting empty pages till the cleanup phase.
> > > >
> > >
> > > Considering the current strategy of page deletion of gist index the
> > > advantage of postponing the page deletion till the cleanup phase is
> > > that we can do the bulk deletion in cleanup phase which is called at
> > > most once. But I wonder if we can do the page deletion in the similar
> > > way to btree index.
> > >
> >
> > I think there might be some advantage of the current strategy due to
> > which it has been chosen.  I was going through the development thread
> > and noticed some old email which points something related to this.
> > See [1].
>
> Thanks.
>
> >
> > > Or even we use the current strategy I think we can
> > > do that while not passing the pages information from bulkdelete to
> > > vacuumcleanup using by GistBulkDeleteResult.
> > >
> >
> > Yeah, I also think so.  I have started a new thread [2] to know the
> > opinion of others on this matter.
> >
>
> Thank you.
>
> > > > If we avoid postponing deleting empty pages till the cleanup phase,
> > > > then we don't have the problem for gist indexes.
> > >
> > > Yes. But considering your pointing out I guess that there might be
> > > other index AMs use the stats returned from bulkdelete in the similar
> > > way to gist index (i.e. using more larger structure of which
> > > IndexBulkDeleteResult is just the first field). If we have the same
> > > concern the parallel vacuum still needs to deal with that as you
> > > mentioned.
> > >
> >
> > Right, apart from some functions for memory allocation/estimation and
> > stats copy, we might need something like amcanparallelvacuum, so that
> > index methods can have the option to not participate in parallel
> > vacuum due to reasons similar to gist or something else.  I think we
> > can work towards this direction as this anyway seems to be required
> > and till we reach any conclusion for gist indexes, you can mark
> > amcanparallelvacuum for gist indexes as false.
>
> Agreed. I'll create a separate patch to add this callback and change
> parallel vacuum patch so that it checks the participation of indexes
> and then vacuums on un-participated indexes after parallel vacuum.

amcanparallelvacuum is not necessary to be a callback, it can be a
boolean field of IndexAmRoutine.

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Tue, Oct 15, 2019 at 12:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>

> Right, apart from some functions for memory allocation/estimation and
> stats copy, we might need something like amcanparallelvacuum, so that
> index methods can have the option to not participate in parallel
> vacuum due to reasons similar to gist or something else.  I think we
> can work towards this direction as this anyway seems to be required
> and till we reach any conclusion for gist indexes, you can mark
> amcanparallelvacuum for gist indexes as false.
>
I think for estimating the size of the stat I suggest "amestimatestat"
or "amstatsize" and for copy stat data we can add "amcopystat"?  It
would be helpful to extend the parallel vacuum for the indexes which
has extended stats.


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Tue, Oct 15, 2019 at 1:26 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Oct 15, 2019 at 4:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > > > > If we avoid postponing deleting empty pages till the cleanup phase,
> > > > > then we don't have the problem for gist indexes.
> > > >
> > > > Yes. But considering your pointing out I guess that there might be
> > > > other index AMs use the stats returned from bulkdelete in the similar
> > > > way to gist index (i.e. using more larger structure of which
> > > > IndexBulkDeleteResult is just the first field). If we have the same
> > > > concern the parallel vacuum still needs to deal with that as you
> > > > mentioned.
> > > >
> > >
> > > Right, apart from some functions for memory allocation/estimation and
> > > stats copy, we might need something like amcanparallelvacuum, so that
> > > index methods can have the option to not participate in parallel
> > > vacuum due to reasons similar to gist or something else.  I think we
> > > can work towards this direction as this anyway seems to be required
> > > and till we reach any conclusion for gist indexes, you can mark
> > > amcanparallelvacuum for gist indexes as false.
> >
> > Agreed. I'll create a separate patch to add this callback and change
> > parallel vacuum patch so that it checks the participation of indexes
> > and then vacuums on un-participated indexes after parallel vacuum.
>
> amcanparallelvacuum is not necessary to be a callback, it can be a
> boolean field of IndexAmRoutine.
>

Yes, it will be a boolean.  Note that for parallel-index scans, we
already have amcanparallel.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, Oct 15, 2019 at 6:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Oct 15, 2019 at 1:26 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Oct 15, 2019 at 4:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > > > > If we avoid postponing deleting empty pages till the cleanup phase,
> > > > > > then we don't have the problem for gist indexes.
> > > > >
> > > > > Yes. But considering your pointing out I guess that there might be
> > > > > other index AMs use the stats returned from bulkdelete in the similar
> > > > > way to gist index (i.e. using more larger structure of which
> > > > > IndexBulkDeleteResult is just the first field). If we have the same
> > > > > concern the parallel vacuum still needs to deal with that as you
> > > > > mentioned.
> > > > >
> > > >
> > > > Right, apart from some functions for memory allocation/estimation and
> > > > stats copy, we might need something like amcanparallelvacuum, so that
> > > > index methods can have the option to not participate in parallel
> > > > vacuum due to reasons similar to gist or something else.  I think we
> > > > can work towards this direction as this anyway seems to be required
> > > > and till we reach any conclusion for gist indexes, you can mark
> > > > amcanparallelvacuum for gist indexes as false.
> > >
> > > Agreed. I'll create a separate patch to add this callback and change
> > > parallel vacuum patch so that it checks the participation of indexes
> > > and then vacuums on un-participated indexes after parallel vacuum.
> >
> > amcanparallelvacuum is not necessary to be a callback, it can be a
> > boolean field of IndexAmRoutine.
> >
>
> Yes, it will be a boolean.  Note that for parallel-index scans, we
> already have amcanparallel.
>

Attached updated patch set. 0001 patch introduces new index AM field
amcanparallelvacuum. All index AMs except for gist sets true for now.
0002 patch incorporated the all comments I got so far.

Regards,

--
Masahiko Sawada

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Wed, Oct 16, 2019 at 6:50 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Oct 15, 2019 at 6:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
>
> Attached updated patch set. 0001 patch introduces new index AM field
> amcanparallelvacuum. All index AMs except for gist sets true for now.
> 0002 patch incorporated the all comments I got so far.
>

I haven't studied the latest patch in detail, but it seems you are
still assuming that all indexes will have the same amount of shared
memory for index stats and copying it in the same way. I thought we
agreed that each index AM should do this on its own.  The basic
problem is as of now we see this problem only with the Gist index, but
some other index AM's could also have a similar problem.

Another major problem with previous and this patch version is that the
cost-based vacuum concept seems to be entirely broken.  Basically,
each parallel vacuum worker operates independently w.r.t vacuum delay
and cost.  Assume that the overall I/O allowed for vacuum operation is
X after which it will sleep for some time, reset the balance and
continue.  In the patch, each worker will be allowed to perform X
before which it can sleep and also there is no coordination for the
same with master backend.  This is somewhat similar to memory usage
problem, but a bit more tricky because here we can't easily split the
I/O for each of the worker.

One idea could be that we somehow map vacuum costing related
parameters to the shared memory (dsm) which the vacuum operation is
using and then allow workers to coordinate.  This way master and
worker processes will have the same view of balance cost and can act
accordingly.

The other idea could be that we come up with some smart way to split
the I/O among workers.  Initially, I thought we could try something as
we do for autovacuum workers (see autovac_balance_cost), but I think
that will require much more math.  Before launching workers, we need
to compute the remaining I/O (heap operation would have used
something) after which we need to sleep and continue the operation and
then somehow split it equally across workers.  Once the workers are
finished, then need to let master backend know how much I/O they have
consumed and then master backend can add it to it's current I/O
consumed.

I think this problem matters because the vacuum delay is useful for
large vacuums and this patch is trying to exactly solve that problem,
so we can't ignore this problem.  I am not yet sure what is the best
solution to this problem, but I think we need to do something for it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Wed, Oct 16, 2019 at 3:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Oct 16, 2019 at 6:50 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Oct 15, 2019 at 6:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> >
> > Attached updated patch set. 0001 patch introduces new index AM field
> > amcanparallelvacuum. All index AMs except for gist sets true for now.
> > 0002 patch incorporated the all comments I got so far.
> >
>
> I haven't studied the latest patch in detail, but it seems you are
> still assuming that all indexes will have the same amount of shared
> memory for index stats and copying it in the same way.

Yeah I thought we agreed at least to have canparallelvacuum and if an
index AM cannot support parallel index vacuuming like gist, it returns
false.

> I thought we
> agreed that each index AM should do this on its own.  The basic
> problem is as of now we see this problem only with the Gist index, but
> some other index AM's could also have a similar problem.

Okay. I'm thinking we're going to have a new callback to ack index AMs
the size of the structure using within both ambulkdelete and
amvacuumcleanup. But copying it to DSM can be done by the core because
it knows how many bytes need to be copied to DSM. Is that okay?

>
> Another major problem with previous and this patch version is that the
> cost-based vacuum concept seems to be entirely broken.  Basically,
> each parallel vacuum worker operates independently w.r.t vacuum delay
> and cost.  Assume that the overall I/O allowed for vacuum operation is
> X after which it will sleep for some time, reset the balance and
> continue.  In the patch, each worker will be allowed to perform X
> before which it can sleep and also there is no coordination for the
> same with master backend.  This is somewhat similar to memory usage
> problem, but a bit more tricky because here we can't easily split the
> I/O for each of the worker.
>
> One idea could be that we somehow map vacuum costing related
> parameters to the shared memory (dsm) which the vacuum operation is
> using and then allow workers to coordinate.  This way master and
> worker processes will have the same view of balance cost and can act
> accordingly.
>
> The other idea could be that we come up with some smart way to split
> the I/O among workers.  Initially, I thought we could try something as
> we do for autovacuum workers (see autovac_balance_cost), but I think
> that will require much more math.  Before launching workers, we need
> to compute the remaining I/O (heap operation would have used
> something) after which we need to sleep and continue the operation and
> then somehow split it equally across workers.  Once the workers are
> finished, then need to let master backend know how much I/O they have
> consumed and then master backend can add it to it's current I/O
> consumed.
>
> I think this problem matters because the vacuum delay is useful for
> large vacuums and this patch is trying to exactly solve that problem,
> so we can't ignore this problem.  I am not yet sure what is the best
> solution to this problem, but I think we need to do something for it.
>

I guess that the concepts of vacuum delay contradicts the concepts of
parallel vacuum. The concepts of parallel vacuum would be to use more
resource to make vacuum faster. Vacuum delays balances I/O during
vacuum in order to avoid I/O spikes by vacuum but parallel vacuum
rather concentrates I/O in shorter duration. Since we need to share
the memory in entire system we need to deal with the memory issue but
disks are different.

If we need to deal with this problem how about just dividing
vacuum_cost_limit by the parallel degree and setting it to worker's
vacuum_cost_limit?

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh
Date:
Hi
I applied all 3 patches and ran regression test. I was getting one regression failure.

diff -U3 /home/mahendra/postgres_base_rp/postgres/src/test/regress/expected/vacuum.out /home/mahendra/postgres_base_rp/postgres/src/test/regress/results/vacuum.out
--- /home/mahendra/postgres_base_rp/postgres/src/test/regress/expected/vacuum.out 2019-10-17 10:01:58.138863802 +0530
+++ /home/mahendra/postgres_base_rp/postgres/src/test/regress/results/vacuum.out 2019-10-17 11:41:20.930699926 +0530
@@ -105,7 +105,7 @@
 CREATE TEMPORARY TABLE tmp (a int PRIMARY KEY);
 CREATE INDEX tmp_idx1 ON tmp (a);
 VACUUM (PARALLEL 1) tmp; -- error, cannot parallel vacuum temporary tables
-WARNING:  skipping "tmp" --- cannot parallel vacuum temporary tables
+WARNING:  skipping vacuum on "tmp" --- cannot vacuum temporary tables in parallel
 -- INDEX_CLEANUP option
 CREATE TABLE no_index_cleanup (i INT PRIMARY KEY, t TEXT);
 -- Use uncompressed data stored in toast.

It look likes that you changed warning message for temp table, but haven't updated expected out file.

Thanks and Regards
Mahendra Thalor

On Wed, 16 Oct 2019 at 06:50, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, Oct 15, 2019 at 6:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Oct 15, 2019 at 1:26 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Oct 15, 2019 at 4:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > > > > If we avoid postponing deleting empty pages till the cleanup phase,
> > > > > > then we don't have the problem for gist indexes.
> > > > >
> > > > > Yes. But considering your pointing out I guess that there might be
> > > > > other index AMs use the stats returned from bulkdelete in the similar
> > > > > way to gist index (i.e. using more larger structure of which
> > > > > IndexBulkDeleteResult is just the first field). If we have the same
> > > > > concern the parallel vacuum still needs to deal with that as you
> > > > > mentioned.
> > > > >
> > > >
> > > > Right, apart from some functions for memory allocation/estimation and
> > > > stats copy, we might need something like amcanparallelvacuum, so that
> > > > index methods can have the option to not participate in parallel
> > > > vacuum due to reasons similar to gist or something else.  I think we
> > > > can work towards this direction as this anyway seems to be required
> > > > and till we reach any conclusion for gist indexes, you can mark
> > > > amcanparallelvacuum for gist indexes as false.
> > >
> > > Agreed. I'll create a separate patch to add this callback and change
> > > parallel vacuum patch so that it checks the participation of indexes
> > > and then vacuums on un-participated indexes after parallel vacuum.
> >
> > amcanparallelvacuum is not necessary to be a callback, it can be a
> > boolean field of IndexAmRoutine.
> >
>
> Yes, it will be a boolean.  Note that for parallel-index scans, we
> already have amcanparallel.
>

Attached updated patch set. 0001 patch introduces new index AM field
amcanparallelvacuum. All index AMs except for gist sets true for now.
0002 patch incorporated the all comments I got so far.

Regards,

--
Masahiko Sawada

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Thu, Oct 17, 2019 at 3:18 PM Mahendra Singh <mahi6run@gmail.com> wrote:
>
> Hi
> I applied all 3 patches and ran regression test. I was getting one regression failure.
>
>> diff -U3 /home/mahendra/postgres_base_rp/postgres/src/test/regress/expected/vacuum.out
/home/mahendra/postgres_base_rp/postgres/src/test/regress/results/vacuum.out
>> --- /home/mahendra/postgres_base_rp/postgres/src/test/regress/expected/vacuum.out 2019-10-17 10:01:58.138863802
+0530
>> +++ /home/mahendra/postgres_base_rp/postgres/src/test/regress/results/vacuum.out 2019-10-17 11:41:20.930699926
+0530
>> @@ -105,7 +105,7 @@
>>  CREATE TEMPORARY TABLE tmp (a int PRIMARY KEY);
>>  CREATE INDEX tmp_idx1 ON tmp (a);
>>  VACUUM (PARALLEL 1) tmp; -- error, cannot parallel vacuum temporary tables
>> -WARNING:  skipping "tmp" --- cannot parallel vacuum temporary tables
>> +WARNING:  skipping vacuum on "tmp" --- cannot vacuum temporary tables in parallel
>>  -- INDEX_CLEANUP option
>>  CREATE TABLE no_index_cleanup (i INT PRIMARY KEY, t TEXT);
>>  -- Use uncompressed data stored in toast.
>
>
> It look likes that you changed warning message for temp table, but haven't updated expected out file.
>

Thank you!
I forgot to change the expected file. I'll fix it in the next version patch.

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Thu, Oct 17, 2019 at 10:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Oct 16, 2019 at 3:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Oct 16, 2019 at 6:50 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Tue, Oct 15, 2019 at 6:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > >
> > > Attached updated patch set. 0001 patch introduces new index AM field
> > > amcanparallelvacuum. All index AMs except for gist sets true for now.
> > > 0002 patch incorporated the all comments I got so far.
> > >
> >
> > I haven't studied the latest patch in detail, but it seems you are
> > still assuming that all indexes will have the same amount of shared
> > memory for index stats and copying it in the same way.
>
> Yeah I thought we agreed at least to have canparallelvacuum and if an
> index AM cannot support parallel index vacuuming like gist, it returns
> false.
>
> > I thought we
> > agreed that each index AM should do this on its own.  The basic
> > problem is as of now we see this problem only with the Gist index, but
> > some other index AM's could also have a similar problem.
>
> Okay. I'm thinking we're going to have a new callback to ack index AMs
> the size of the structure using within both ambulkdelete and
> amvacuumcleanup. But copying it to DSM can be done by the core because
> it knows how many bytes need to be copied to DSM. Is that okay?
>

That sounds okay.

> >
> > Another major problem with previous and this patch version is that the
> > cost-based vacuum concept seems to be entirely broken.  Basically,
> > each parallel vacuum worker operates independently w.r.t vacuum delay
> > and cost.  Assume that the overall I/O allowed for vacuum operation is
> > X after which it will sleep for some time, reset the balance and
> > continue.  In the patch, each worker will be allowed to perform X
> > before which it can sleep and also there is no coordination for the
> > same with master backend.  This is somewhat similar to memory usage
> > problem, but a bit more tricky because here we can't easily split the
> > I/O for each of the worker.
> >
> > One idea could be that we somehow map vacuum costing related
> > parameters to the shared memory (dsm) which the vacuum operation is
> > using and then allow workers to coordinate.  This way master and
> > worker processes will have the same view of balance cost and can act
> > accordingly.
> >
> > The other idea could be that we come up with some smart way to split
> > the I/O among workers.  Initially, I thought we could try something as
> > we do for autovacuum workers (see autovac_balance_cost), but I think
> > that will require much more math.  Before launching workers, we need
> > to compute the remaining I/O (heap operation would have used
> > something) after which we need to sleep and continue the operation and
> > then somehow split it equally across workers.  Once the workers are
> > finished, then need to let master backend know how much I/O they have
> > consumed and then master backend can add it to it's current I/O
> > consumed.
> >
> > I think this problem matters because the vacuum delay is useful for
> > large vacuums and this patch is trying to exactly solve that problem,
> > so we can't ignore this problem.  I am not yet sure what is the best
> > solution to this problem, but I think we need to do something for it.
> >
>
> I guess that the concepts of vacuum delay contradicts the concepts of
> parallel vacuum. The concepts of parallel vacuum would be to use more
> resource to make vacuum faster. Vacuum delays balances I/O during
> vacuum in order to avoid I/O spikes by vacuum but parallel vacuum
> rather concentrates I/O in shorter duration.
>

You have a point, but the way it is currently working in the patch
doesn't make much sense.  Basically, each of the parallel workers will
be allowed to use a complete I/O limit which is actually a limit for
the entire vacuum operation.  It doesn't give any consideration to the
work done for the heap.

> Since we need to share
> the memory in entire system we need to deal with the memory issue but
> disks are different.
>
> If we need to deal with this problem how about just dividing
> vacuum_cost_limit by the parallel degree and setting it to worker's
> vacuum_cost_limit?
>

How will we take the I/O done by heap into consideration?  The
vacuum_cost_limit is the cost for the entire vacuum operation not
separately for heap and indexes.  What makes you think that
considering the limit for heap and index separately is not
problematic?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Thu, Oct 17, 2019 at 12:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Oct 17, 2019 at 10:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > I guess that the concepts of vacuum delay contradicts the concepts of
> > parallel vacuum. The concepts of parallel vacuum would be to use more
> > resource to make vacuum faster. Vacuum delays balances I/O during
> > vacuum in order to avoid I/O spikes by vacuum but parallel vacuum
> > rather concentrates I/O in shorter duration.
> >
>
> You have a point, but the way it is currently working in the patch
> doesn't make much sense.
>

Another point in this regard is that the user anyway has an option to
turn off the cost-based vacuum.  By default, it is anyway disabled.
So, if the user enables it we have to provide some sensible behavior.
If we can't come up with anything, then, in the end, we might want to
turn it off for a parallel vacuum and mention the same in docs, but I
think we should try to come up with a solution for it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Oct 17, 2019 at 12:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Oct 17, 2019 at 10:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > I guess that the concepts of vacuum delay contradicts the concepts of
> > > parallel vacuum. The concepts of parallel vacuum would be to use more
> > > resource to make vacuum faster. Vacuum delays balances I/O during
> > > vacuum in order to avoid I/O spikes by vacuum but parallel vacuum
> > > rather concentrates I/O in shorter duration.
> > >
> >
> > You have a point, but the way it is currently working in the patch
> > doesn't make much sense.
> >
>
> Another point in this regard is that the user anyway has an option to
> turn off the cost-based vacuum.  By default, it is anyway disabled.
> So, if the user enables it we have to provide some sensible behavior.
> If we can't come up with anything, then, in the end, we might want to
> turn it off for a parallel vacuum and mention the same in docs, but I
> think we should try to come up with a solution for it.

I finally got your point and now understood the need. And the idea I
proposed doesn't work fine.

So you meant that all workers share the cost count and if a parallel
vacuum worker increase the cost and it reaches the limit, does the
only one worker sleep? Is that okay even though other parallel workers
are still running and then the sleep might not help?

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Oct 17, 2019 at 12:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Thu, Oct 17, 2019 at 10:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > I guess that the concepts of vacuum delay contradicts the concepts of
> > > > parallel vacuum. The concepts of parallel vacuum would be to use more
> > > > resource to make vacuum faster. Vacuum delays balances I/O during
> > > > vacuum in order to avoid I/O spikes by vacuum but parallel vacuum
> > > > rather concentrates I/O in shorter duration.
> > > >
> > >
> > > You have a point, but the way it is currently working in the patch
> > > doesn't make much sense.
> > >
> >
> > Another point in this regard is that the user anyway has an option to
> > turn off the cost-based vacuum.  By default, it is anyway disabled.
> > So, if the user enables it we have to provide some sensible behavior.
> > If we can't come up with anything, then, in the end, we might want to
> > turn it off for a parallel vacuum and mention the same in docs, but I
> > think we should try to come up with a solution for it.
>
> I finally got your point and now understood the need. And the idea I
> proposed doesn't work fine.
>
> So you meant that all workers share the cost count and if a parallel
> vacuum worker increase the cost and it reaches the limit, does the
> only one worker sleep? Is that okay even though other parallel workers
> are still running and then the sleep might not help?
>
I agree with this point.  There is a possibility that some of the
workers who are doing heavy I/O continue to work and OTOH other
workers who are doing very less I/O might become the victim and
unnecessarily delay its operation.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > Another point in this regard is that the user anyway has an option to
> > > turn off the cost-based vacuum.  By default, it is anyway disabled.
> > > So, if the user enables it we have to provide some sensible behavior.
> > > If we can't come up with anything, then, in the end, we might want to
> > > turn it off for a parallel vacuum and mention the same in docs, but I
> > > think we should try to come up with a solution for it.
> >
> > I finally got your point and now understood the need. And the idea I
> > proposed doesn't work fine.
> >
> > So you meant that all workers share the cost count and if a parallel
> > vacuum worker increase the cost and it reaches the limit, does the
> > only one worker sleep? Is that okay even though other parallel workers
> > are still running and then the sleep might not help?
> >

Remember that the other running workers will also increase
VacuumCostBalance and whichever worker finds that it becomes greater
than VacuumCostLimit will reset its value and sleep.  So, won't this
make sure that overall throttling works the same?

> I agree with this point.  There is a possibility that some of the
> workers who are doing heavy I/O continue to work and OTOH other
> workers who are doing very less I/O might become the victim and
> unnecessarily delay its operation.
>

Sure, but will it impact the overall I/O?  I mean to say the rate
limit we want to provide for overall vacuum operation will still be
the same.  Also, isn't a similar thing happens now also where heap
might have done a major portion of I/O but soon after we start
vacuuming the index, we will hit the limit and will sleep.

I think this might not be the perfect solution and we should try to
come up with something else if this doesn't seem to be working.  Have
you guys thought about the second solution I mentioned in email [1]
(Before launching workers, we need to compute the remaining I/O ....)?
 Any other better ideas?

[1] - https://www.postgresql.org/message-id/CAA4eK1%2BySETHCaCnAsEC-dC4GSXaE2sNGMOgD6J%3DX%2BN43bBqJQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > Another point in this regard is that the user anyway has an option to
> > > > turn off the cost-based vacuum.  By default, it is anyway disabled.
> > > > So, if the user enables it we have to provide some sensible behavior.
> > > > If we can't come up with anything, then, in the end, we might want to
> > > > turn it off for a parallel vacuum and mention the same in docs, but I
> > > > think we should try to come up with a solution for it.
> > >
> > > I finally got your point and now understood the need. And the idea I
> > > proposed doesn't work fine.
> > >
> > > So you meant that all workers share the cost count and if a parallel
> > > vacuum worker increase the cost and it reaches the limit, does the
> > > only one worker sleep? Is that okay even though other parallel workers
> > > are still running and then the sleep might not help?
> > >
>
> Remember that the other running workers will also increase
> VacuumCostBalance and whichever worker finds that it becomes greater
> than VacuumCostLimit will reset its value and sleep.  So, won't this
> make sure that overall throttling works the same?
>
> > I agree with this point.  There is a possibility that some of the
> > workers who are doing heavy I/O continue to work and OTOH other
> > workers who are doing very less I/O might become the victim and
> > unnecessarily delay its operation.
> >
>
> Sure, but will it impact the overall I/O?  I mean to say the rate
> limit we want to provide for overall vacuum operation will still be
> the same.  Also, isn't a similar thing happens now also where heap
> might have done a major portion of I/O but soon after we start
> vacuuming the index, we will hit the limit and will sleep.

Actually, What I meant is that the worker who performing actual I/O
might not go for the delay and another worker which has done only CPU
operation might pay the penalty?  So basically the worker who is doing
CPU intensive operation might go for the delay and pay the penalty and
the worker who is performing actual I/O continues to work and do
further I/O.  Do you think this is not a practical problem?

Stepping back a bit,  OTOH, I think that we can not guarantee that the
one worker who has done more I/O will continue to do further I/O and
the one which has not done much I/O will not perform more I/O in
future.  So it might not be too bad if we compute shared costs as you
suggested above.

>
> I think this might not be the perfect solution and we should try to
> come up with something else if this doesn't seem to be working.  Have
> you guys thought about the second solution I mentioned in email [1]
> (Before launching workers, we need to compute the remaining I/O ....)?
>  Any other better ideas?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Fri, Oct 18, 2019 at 8:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > Another point in this regard is that the user anyway has an option to
> > > > > turn off the cost-based vacuum.  By default, it is anyway disabled.
> > > > > So, if the user enables it we have to provide some sensible behavior.
> > > > > If we can't come up with anything, then, in the end, we might want to
> > > > > turn it off for a parallel vacuum and mention the same in docs, but I
> > > > > think we should try to come up with a solution for it.
> > > >
> > > > I finally got your point and now understood the need. And the idea I
> > > > proposed doesn't work fine.
> > > >
> > > > So you meant that all workers share the cost count and if a parallel
> > > > vacuum worker increase the cost and it reaches the limit, does the
> > > > only one worker sleep? Is that okay even though other parallel workers
> > > > are still running and then the sleep might not help?
> > > >
> >
> > Remember that the other running workers will also increase
> > VacuumCostBalance and whichever worker finds that it becomes greater
> > than VacuumCostLimit will reset its value and sleep.  So, won't this
> > make sure that overall throttling works the same?
> >
> > > I agree with this point.  There is a possibility that some of the
> > > workers who are doing heavy I/O continue to work and OTOH other
> > > workers who are doing very less I/O might become the victim and
> > > unnecessarily delay its operation.
> > >
> >
> > Sure, but will it impact the overall I/O?  I mean to say the rate
> > limit we want to provide for overall vacuum operation will still be
> > the same.  Also, isn't a similar thing happens now also where heap
> > might have done a major portion of I/O but soon after we start
> > vacuuming the index, we will hit the limit and will sleep.
>
> Actually, What I meant is that the worker who performing actual I/O
> might not go for the delay and another worker which has done only CPU
> operation might pay the penalty?  So basically the worker who is doing
> CPU intensive operation might go for the delay and pay the penalty and
> the worker who is performing actual I/O continues to work and do
> further I/O.  Do you think this is not a practical problem?
>

I don't know.  Generally, we try to delay (if required) before
processing (read/write) one page which means it will happen for I/O
intensive operations, so I am not sure if the point you are making is
completely correct.

> Stepping back a bit,  OTOH, I think that we can not guarantee that the
> one worker who has done more I/O will continue to do further I/O and
> the one which has not done much I/O will not perform more I/O in
> future.  So it might not be too bad if we compute shared costs as you
> suggested above.
>

I am thinking if we can write the patch for both the approaches (a.
compute shared costs and try to delay based on that, b. try to divide
the I/O cost among workers as described in the email above[1]) and do
some tests to see the behavior of throttling, that might help us in
deciding what is the best strategy to solve this problem, if any.
What do you think?


[1] - https://www.postgresql.org/message-id/CAA4eK1%2BySETHCaCnAsEC-dC4GSXaE2sNGMOgD6J%3DX%2BN43bBqJQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Oct 18, 2019 at 8:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > >
> > > > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > > Another point in this regard is that the user anyway has an option to
> > > > > > turn off the cost-based vacuum.  By default, it is anyway disabled.
> > > > > > So, if the user enables it we have to provide some sensible behavior.
> > > > > > If we can't come up with anything, then, in the end, we might want to
> > > > > > turn it off for a parallel vacuum and mention the same in docs, but I
> > > > > > think we should try to come up with a solution for it.
> > > > >
> > > > > I finally got your point and now understood the need. And the idea I
> > > > > proposed doesn't work fine.
> > > > >
> > > > > So you meant that all workers share the cost count and if a parallel
> > > > > vacuum worker increase the cost and it reaches the limit, does the
> > > > > only one worker sleep? Is that okay even though other parallel workers
> > > > > are still running and then the sleep might not help?
> > > > >
> > >
> > > Remember that the other running workers will also increase
> > > VacuumCostBalance and whichever worker finds that it becomes greater
> > > than VacuumCostLimit will reset its value and sleep.  So, won't this
> > > make sure that overall throttling works the same?
> > >
> > > > I agree with this point.  There is a possibility that some of the
> > > > workers who are doing heavy I/O continue to work and OTOH other
> > > > workers who are doing very less I/O might become the victim and
> > > > unnecessarily delay its operation.
> > > >
> > >
> > > Sure, but will it impact the overall I/O?  I mean to say the rate
> > > limit we want to provide for overall vacuum operation will still be
> > > the same.  Also, isn't a similar thing happens now also where heap
> > > might have done a major portion of I/O but soon after we start
> > > vacuuming the index, we will hit the limit and will sleep.
> >
> > Actually, What I meant is that the worker who performing actual I/O
> > might not go for the delay and another worker which has done only CPU
> > operation might pay the penalty?  So basically the worker who is doing
> > CPU intensive operation might go for the delay and pay the penalty and
> > the worker who is performing actual I/O continues to work and do
> > further I/O.  Do you think this is not a practical problem?
> >
>
> I don't know.  Generally, we try to delay (if required) before
> processing (read/write) one page which means it will happen for I/O
> intensive operations, so I am not sure if the point you are making is
> completely correct.

Ok, I agree with the point that we are checking it only when we are
doing the I/O operation.  But, we also need to consider that each I/O
operations have a different weightage.  So even if we have a delay
point at I/O operation there is a possibility that we might delay the
worker which is just performing read buffer with page
hit(VacuumCostPageHit).  But, the other worker who is actually
dirtying the page(VacuumCostPageDirty = 20) continue the work and do
more I/O.

>
> > Stepping back a bit,  OTOH, I think that we can not guarantee that the
> > one worker who has done more I/O will continue to do further I/O and
> > the one which has not done much I/O will not perform more I/O in
> > future.  So it might not be too bad if we compute shared costs as you
> > suggested above.
> >
>
> I am thinking if we can write the patch for both the approaches (a.
> compute shared costs and try to delay based on that, b. try to divide
> the I/O cost among workers as described in the email above[1]) and do
> some tests to see the behavior of throttling, that might help us in
> deciding what is the best strategy to solve this problem, if any.
> What do you think?

I agree with this idea.  I can come up with a POC patch for approach
(b).  Meanwhile, if someone is interested to quickly hack with the
approach (a) then we can do some testing and compare.  Sawada-san,
by any chance will you be interested to write POC with approach (a)?
Otherwise, I will try to write it after finishing the first one
(approach b).

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Fri, Oct 18, 2019 at 3:48 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Oct 18, 2019 at 8:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > > >
> > > > > > > Another point in this regard is that the user anyway has an option to
> > > > > > > turn off the cost-based vacuum.  By default, it is anyway disabled.
> > > > > > > So, if the user enables it we have to provide some sensible behavior.
> > > > > > > If we can't come up with anything, then, in the end, we might want to
> > > > > > > turn it off for a parallel vacuum and mention the same in docs, but I
> > > > > > > think we should try to come up with a solution for it.
> > > > > >
> > > > > > I finally got your point and now understood the need. And the idea I
> > > > > > proposed doesn't work fine.
> > > > > >
> > > > > > So you meant that all workers share the cost count and if a parallel
> > > > > > vacuum worker increase the cost and it reaches the limit, does the
> > > > > > only one worker sleep? Is that okay even though other parallel workers
> > > > > > are still running and then the sleep might not help?
> > > > > >
> > > >
> > > > Remember that the other running workers will also increase
> > > > VacuumCostBalance and whichever worker finds that it becomes greater
> > > > than VacuumCostLimit will reset its value and sleep.  So, won't this
> > > > make sure that overall throttling works the same?
> > > >
> > > > > I agree with this point.  There is a possibility that some of the
> > > > > workers who are doing heavy I/O continue to work and OTOH other
> > > > > workers who are doing very less I/O might become the victim and
> > > > > unnecessarily delay its operation.
> > > > >
> > > >
> > > > Sure, but will it impact the overall I/O?  I mean to say the rate
> > > > limit we want to provide for overall vacuum operation will still be
> > > > the same.  Also, isn't a similar thing happens now also where heap
> > > > might have done a major portion of I/O but soon after we start
> > > > vacuuming the index, we will hit the limit and will sleep.
> > >
> > > Actually, What I meant is that the worker who performing actual I/O
> > > might not go for the delay and another worker which has done only CPU
> > > operation might pay the penalty?  So basically the worker who is doing
> > > CPU intensive operation might go for the delay and pay the penalty and
> > > the worker who is performing actual I/O continues to work and do
> > > further I/O.  Do you think this is not a practical problem?
> > >
> >
> > I don't know.  Generally, we try to delay (if required) before
> > processing (read/write) one page which means it will happen for I/O
> > intensive operations, so I am not sure if the point you are making is
> > completely correct.
>
> Ok, I agree with the point that we are checking it only when we are
> doing the I/O operation.  But, we also need to consider that each I/O
> operations have a different weightage.  So even if we have a delay
> point at I/O operation there is a possibility that we might delay the
> worker which is just performing read buffer with page
> hit(VacuumCostPageHit).  But, the other worker who is actually
> dirtying the page(VacuumCostPageDirty = 20) continue the work and do
> more I/O.
>
> >
> > > Stepping back a bit,  OTOH, I think that we can not guarantee that the
> > > one worker who has done more I/O will continue to do further I/O and
> > > the one which has not done much I/O will not perform more I/O in
> > > future.  So it might not be too bad if we compute shared costs as you
> > > suggested above.
> > >
> >
> > I am thinking if we can write the patch for both the approaches (a.
> > compute shared costs and try to delay based on that, b. try to divide
> > the I/O cost among workers as described in the email above[1]) and do
> > some tests to see the behavior of throttling, that might help us in
> > deciding what is the best strategy to solve this problem, if any.
> > What do you think?
>
> I agree with this idea.  I can come up with a POC patch for approach
> (b).  Meanwhile, if someone is interested to quickly hack with the
> approach (a) then we can do some testing and compare.  Sawada-san,
> by any chance will you be interested to write POC with approach (a)?

Yes, I will try to write the PoC patch with approach (a).

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Oct 18, 2019 at 8:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > > >
> > > > > > > Another point in this regard is that the user anyway has an option to
> > > > > > > turn off the cost-based vacuum.  By default, it is anyway disabled.
> > > > > > > So, if the user enables it we have to provide some sensible behavior.
> > > > > > > If we can't come up with anything, then, in the end, we might want to
> > > > > > > turn it off for a parallel vacuum and mention the same in docs, but I
> > > > > > > think we should try to come up with a solution for it.
> > > > > >
> > > > > > I finally got your point and now understood the need. And the idea I
> > > > > > proposed doesn't work fine.
> > > > > >
> > > > > > So you meant that all workers share the cost count and if a parallel
> > > > > > vacuum worker increase the cost and it reaches the limit, does the
> > > > > > only one worker sleep? Is that okay even though other parallel workers
> > > > > > are still running and then the sleep might not help?
> > > > > >
> > > >
> > > > Remember that the other running workers will also increase
> > > > VacuumCostBalance and whichever worker finds that it becomes greater
> > > > than VacuumCostLimit will reset its value and sleep.  So, won't this
> > > > make sure that overall throttling works the same?
> > > >
> > > > > I agree with this point.  There is a possibility that some of the
> > > > > workers who are doing heavy I/O continue to work and OTOH other
> > > > > workers who are doing very less I/O might become the victim and
> > > > > unnecessarily delay its operation.
> > > > >
> > > >
> > > > Sure, but will it impact the overall I/O?  I mean to say the rate
> > > > limit we want to provide for overall vacuum operation will still be
> > > > the same.  Also, isn't a similar thing happens now also where heap
> > > > might have done a major portion of I/O but soon after we start
> > > > vacuuming the index, we will hit the limit and will sleep.
> > >
> > > Actually, What I meant is that the worker who performing actual I/O
> > > might not go for the delay and another worker which has done only CPU
> > > operation might pay the penalty?  So basically the worker who is doing
> > > CPU intensive operation might go for the delay and pay the penalty and
> > > the worker who is performing actual I/O continues to work and do
> > > further I/O.  Do you think this is not a practical problem?
> > >
> >
> > I don't know.  Generally, we try to delay (if required) before
> > processing (read/write) one page which means it will happen for I/O
> > intensive operations, so I am not sure if the point you are making is
> > completely correct.
>
> Ok, I agree with the point that we are checking it only when we are
> doing the I/O operation.  But, we also need to consider that each I/O
> operations have a different weightage.  So even if we have a delay
> point at I/O operation there is a possibility that we might delay the
> worker which is just performing read buffer with page
> hit(VacuumCostPageHit).  But, the other worker who is actually
> dirtying the page(VacuumCostPageDirty = 20) continue the work and do
> more I/O.
>
> >
> > > Stepping back a bit,  OTOH, I think that we can not guarantee that the
> > > one worker who has done more I/O will continue to do further I/O and
> > > the one which has not done much I/O will not perform more I/O in
> > > future.  So it might not be too bad if we compute shared costs as you
> > > suggested above.
> > >
> >
> > I am thinking if we can write the patch for both the approaches (a.
> > compute shared costs and try to delay based on that, b. try to divide
> > the I/O cost among workers as described in the email above[1]) and do
> > some tests to see the behavior of throttling, that might help us in
> > deciding what is the best strategy to solve this problem, if any.
> > What do you think?
>
> I agree with this idea.  I can come up with a POC patch for approach
> (b).  Meanwhile, if someone is interested to quickly hack with the
> approach (a) then we can do some testing and compare.  Sawada-san,
> by any chance will you be interested to write POC with approach (a)?
> Otherwise, I will try to write it after finishing the first one
> (approach b).
>
I have come up with the POC for approach (a).

The idea is
1) Before launching the worker divide the current VacuumCostBalance
among workers so that workers start accumulating the balance from that
point.
2) Also, divide the VacuumCostLimit among the workers.
3) Once the worker are done with the index vacuum, send back the
remaining balance with the leader.
4) The leader will sum all the balances and add that to its current
VacuumCostBalance.  And start accumulating its balance from this
point.

I was trying to test how is the behaviour of the vacuum I/O limit, but
I could not find an easy way to test that so I just put the tracepoint
in the code and just checked that at what point we are giving the
delay.
I also printed the cost balance at various point to see that after how
much I/O accumulation we are hitting the delay.  Please feel free to
suggest a better way to test this.

I have printed these logs for parallel vacuum patch (v30) vs v(30) +
patch for dividing i/o limit (attached with the mail)

Note: Patch and the test results are attached.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > I am thinking if we can write the patch for both the approaches (a.
> > > compute shared costs and try to delay based on that, b. try to divide
> > > the I/O cost among workers as described in the email above[1]) and do
> > > some tests to see the behavior of throttling, that might help us in
> > > deciding what is the best strategy to solve this problem, if any.
> > > What do you think?
> >
> > I agree with this idea.  I can come up with a POC patch for approach
> > (b).  Meanwhile, if someone is interested to quickly hack with the
> > approach (a) then we can do some testing and compare.  Sawada-san,
> > by any chance will you be interested to write POC with approach (a)?
> > Otherwise, I will try to write it after finishing the first one
> > (approach b).
> >
> I have come up with the POC for approach (a).
>

I think you mean to say approach (b).

> The idea is
> 1) Before launching the worker divide the current VacuumCostBalance
> among workers so that workers start accumulating the balance from that
> point.
> 2) Also, divide the VacuumCostLimit among the workers.
> 3) Once the worker are done with the index vacuum, send back the
> remaining balance with the leader.
> 4) The leader will sum all the balances and add that to its current
> VacuumCostBalance.  And start accumulating its balance from this
> point.
>
> I was trying to test how is the behaviour of the vacuum I/O limit, but
> I could not find an easy way to test that so I just put the tracepoint
> in the code and just checked that at what point we are giving the
> delay.
> I also printed the cost balance at various point to see that after how
> much I/O accumulation we are hitting the delay.  Please feel free to
> suggest a better way to test this.
>

Can we compute the overall throttling (sleep time) in the operation
separately for heap and index, then divide the index's sleep_time with
a number of workers and add it to heap's sleep time?  Then, it will be
a bit easier to compare the data between parallel and non-parallel
case.

> I have printed these logs for parallel vacuum patch (v30) vs v(30) +
> patch for dividing i/o limit (attached with the mail)
>
> Note: Patch and the test results are attached.
>

I think it is always a good idea to summarize the results and tell
your conclusion about it.  AFAICT, it seems to me this technique as
done in patch might not work for the cases when there is an uneven
amount of work done by parallel workers (say the index sizes vary
(maybe due partial indexes or index column width or some other
reasons)).   The reason for it is that when the worker finishes it's
work we don't rebalance the cost among other workers.  Can we generate
such a test and see how it behaves?  I think it might be possible to
address this if it turns out to be a problem.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Thu, Oct 24, 2019 at 4:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > I am thinking if we can write the patch for both the approaches (a.
> > > > compute shared costs and try to delay based on that, b. try to divide
> > > > the I/O cost among workers as described in the email above[1]) and do
> > > > some tests to see the behavior of throttling, that might help us in
> > > > deciding what is the best strategy to solve this problem, if any.
> > > > What do you think?
> > >
> > > I agree with this idea.  I can come up with a POC patch for approach
> > > (b).  Meanwhile, if someone is interested to quickly hack with the
> > > approach (a) then we can do some testing and compare.  Sawada-san,
> > > by any chance will you be interested to write POC with approach (a)?
> > > Otherwise, I will try to write it after finishing the first one
> > > (approach b).
> > >
> > I have come up with the POC for approach (a).
> >
>
> I think you mean to say approach (b).

Yeah, sorry for the confusion.  It's approach (b).
>
> > The idea is
> > 1) Before launching the worker divide the current VacuumCostBalance
> > among workers so that workers start accumulating the balance from that
> > point.
> > 2) Also, divide the VacuumCostLimit among the workers.
> > 3) Once the worker are done with the index vacuum, send back the
> > remaining balance with the leader.
> > 4) The leader will sum all the balances and add that to its current
> > VacuumCostBalance.  And start accumulating its balance from this
> > point.
> >
> > I was trying to test how is the behaviour of the vacuum I/O limit, but
> > I could not find an easy way to test that so I just put the tracepoint
> > in the code and just checked that at what point we are giving the
> > delay.
> > I also printed the cost balance at various point to see that after how
> > much I/O accumulation we are hitting the delay.  Please feel free to
> > suggest a better way to test this.
> >
>
> Can we compute the overall throttling (sleep time) in the operation
> separately for heap and index, then divide the index's sleep_time with
> a number of workers and add it to heap's sleep time?  Then, it will be
> a bit easier to compare the data between parallel and non-parallel
> case.

Okay, I will try to do that.
>
> > I have printed these logs for parallel vacuum patch (v30) vs v(30) +
> > patch for dividing i/o limit (attached with the mail)
> >
> > Note: Patch and the test results are attached.
> >
>
> I think it is always a good idea to summarize the results and tell
> your conclusion about it.  AFAICT, it seems to me this technique as
> done in patch might not work for the cases when there is an uneven
> amount of work done by parallel workers (say the index sizes vary
> (maybe due partial indexes or index column width or some other
> reasons)).   The reason for it is that when the worker finishes it's
> work we don't rebalance the cost among other workers.
Right, thats one problem I observed.
  Can we generate
> such a test and see how it behaves?  I think it might be possible to
> address this if it turns out to be a problem.
Yeah, we can address this by rebalancing the cost.


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Thu, Oct 24, 2019 at 3:21 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Fri, Oct 18, 2019 at 8:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > > > >
> > > > > > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Another point in this regard is that the user anyway has an option to
> > > > > > > > turn off the cost-based vacuum.  By default, it is anyway disabled.
> > > > > > > > So, if the user enables it we have to provide some sensible behavior.
> > > > > > > > If we can't come up with anything, then, in the end, we might want to
> > > > > > > > turn it off for a parallel vacuum and mention the same in docs, but I
> > > > > > > > think we should try to come up with a solution for it.
> > > > > > >
> > > > > > > I finally got your point and now understood the need. And the idea I
> > > > > > > proposed doesn't work fine.
> > > > > > >
> > > > > > > So you meant that all workers share the cost count and if a parallel
> > > > > > > vacuum worker increase the cost and it reaches the limit, does the
> > > > > > > only one worker sleep? Is that okay even though other parallel workers
> > > > > > > are still running and then the sleep might not help?
> > > > > > >
> > > > >
> > > > > Remember that the other running workers will also increase
> > > > > VacuumCostBalance and whichever worker finds that it becomes greater
> > > > > than VacuumCostLimit will reset its value and sleep.  So, won't this
> > > > > make sure that overall throttling works the same?
> > > > >
> > > > > > I agree with this point.  There is a possibility that some of the
> > > > > > workers who are doing heavy I/O continue to work and OTOH other
> > > > > > workers who are doing very less I/O might become the victim and
> > > > > > unnecessarily delay its operation.
> > > > > >
> > > > >
> > > > > Sure, but will it impact the overall I/O?  I mean to say the rate
> > > > > limit we want to provide for overall vacuum operation will still be
> > > > > the same.  Also, isn't a similar thing happens now also where heap
> > > > > might have done a major portion of I/O but soon after we start
> > > > > vacuuming the index, we will hit the limit and will sleep.
> > > >
> > > > Actually, What I meant is that the worker who performing actual I/O
> > > > might not go for the delay and another worker which has done only CPU
> > > > operation might pay the penalty?  So basically the worker who is doing
> > > > CPU intensive operation might go for the delay and pay the penalty and
> > > > the worker who is performing actual I/O continues to work and do
> > > > further I/O.  Do you think this is not a practical problem?
> > > >
> > >
> > > I don't know.  Generally, we try to delay (if required) before
> > > processing (read/write) one page which means it will happen for I/O
> > > intensive operations, so I am not sure if the point you are making is
> > > completely correct.
> >
> > Ok, I agree with the point that we are checking it only when we are
> > doing the I/O operation.  But, we also need to consider that each I/O
> > operations have a different weightage.  So even if we have a delay
> > point at I/O operation there is a possibility that we might delay the
> > worker which is just performing read buffer with page
> > hit(VacuumCostPageHit).  But, the other worker who is actually
> > dirtying the page(VacuumCostPageDirty = 20) continue the work and do
> > more I/O.
> >
> > >
> > > > Stepping back a bit,  OTOH, I think that we can not guarantee that the
> > > > one worker who has done more I/O will continue to do further I/O and
> > > > the one which has not done much I/O will not perform more I/O in
> > > > future.  So it might not be too bad if we compute shared costs as you
> > > > suggested above.
> > > >
> > >
> > > I am thinking if we can write the patch for both the approaches (a.
> > > compute shared costs and try to delay based on that, b. try to divide
> > > the I/O cost among workers as described in the email above[1]) and do
> > > some tests to see the behavior of throttling, that might help us in
> > > deciding what is the best strategy to solve this problem, if any.
> > > What do you think?
> >
> > I agree with this idea.  I can come up with a POC patch for approach
> > (b).  Meanwhile, if someone is interested to quickly hack with the
> > approach (a) then we can do some testing and compare.  Sawada-san,
> > by any chance will you be interested to write POC with approach (a)?
> > Otherwise, I will try to write it after finishing the first one
> > (approach b).
> >
> I have come up with the POC for approach (a).
>
> The idea is
> 1) Before launching the worker divide the current VacuumCostBalance
> among workers so that workers start accumulating the balance from that
> point.
> 2) Also, divide the VacuumCostLimit among the workers.
> 3) Once the worker are done with the index vacuum, send back the
> remaining balance with the leader.
> 4) The leader will sum all the balances and add that to its current
> VacuumCostBalance.  And start accumulating its balance from this
> point.
>
> I was trying to test how is the behaviour of the vacuum I/O limit, but
> I could not find an easy way to test that so I just put the tracepoint
> in the code and just checked that at what point we are giving the
> delay.
> I also printed the cost balance at various point to see that after how
> much I/O accumulation we are hitting the delay.  Please feel free to
> suggest a better way to test this.
>
> I have printed these logs for parallel vacuum patch (v30) vs v(30) +
> patch for dividing i/o limit (attached with the mail)
>
> Note: Patch and the test results are attached.
>

Thank you!

For approach (a) the basic idea I've come up with is that we have a
shared balance value on DSM and each workers including the leader
process add its local balance value to it in vacuum_delay_point, and
then based on the shared value workers sleep. I'll submit that patch
with other updates.

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Thu, Oct 24, 2019 at 8:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Oct 24, 2019 at 3:21 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > I have come up with the POC for approach (a).
> >
> > The idea is
> > 1) Before launching the worker divide the current VacuumCostBalance
> > among workers so that workers start accumulating the balance from that
> > point.
> > 2) Also, divide the VacuumCostLimit among the workers.
> > 3) Once the worker are done with the index vacuum, send back the
> > remaining balance with the leader.
> > 4) The leader will sum all the balances and add that to its current
> > VacuumCostBalance.  And start accumulating its balance from this
> > point.
> >
> > I was trying to test how is the behaviour of the vacuum I/O limit, but
> > I could not find an easy way to test that so I just put the tracepoint
> > in the code and just checked that at what point we are giving the
> > delay.
> > I also printed the cost balance at various point to see that after how
> > much I/O accumulation we are hitting the delay.  Please feel free to
> > suggest a better way to test this.
> >
> > I have printed these logs for parallel vacuum patch (v30) vs v(30) +
> > patch for dividing i/o limit (attached with the mail)
> >
> > Note: Patch and the test results are attached.
> >
>
> Thank you!
>
> For approach (a) the basic idea I've come up with is that we have a
> shared balance value on DSM and each workers including the leader
> process add its local balance value to it in vacuum_delay_point, and
> then based on the shared value workers sleep. I'll submit that patch
> with other updates.
>

I think it would be better if we can prepare the I/O balance patches
on top of main patch and evaluate both approaches.  We can test both
the approaches and integrate the one which turned out to be good.

Note that, I will be away next week, so I won't be able to review your
latest patch unless you are planning to post today or tomorrow.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Fri, Oct 25, 2019 at 7:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Oct 24, 2019 at 8:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Oct 24, 2019 at 3:21 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > I have come up with the POC for approach (a).
> > >
> > > The idea is
> > > 1) Before launching the worker divide the current VacuumCostBalance
> > > among workers so that workers start accumulating the balance from that
> > > point.
> > > 2) Also, divide the VacuumCostLimit among the workers.
> > > 3) Once the worker are done with the index vacuum, send back the
> > > remaining balance with the leader.
> > > 4) The leader will sum all the balances and add that to its current
> > > VacuumCostBalance.  And start accumulating its balance from this
> > > point.
> > >
> > > I was trying to test how is the behaviour of the vacuum I/O limit, but
> > > I could not find an easy way to test that so I just put the tracepoint
> > > in the code and just checked that at what point we are giving the
> > > delay.
> > > I also printed the cost balance at various point to see that after how
> > > much I/O accumulation we are hitting the delay.  Please feel free to
> > > suggest a better way to test this.
> > >
> > > I have printed these logs for parallel vacuum patch (v30) vs v(30) +
> > > patch for dividing i/o limit (attached with the mail)
> > >
> > > Note: Patch and the test results are attached.
> > >
> >
> > Thank you!
> >
> > For approach (a) the basic idea I've come up with is that we have a
> > shared balance value on DSM and each workers including the leader
> > process add its local balance value to it in vacuum_delay_point, and
> > then based on the shared value workers sleep. I'll submit that patch
> > with other updates.
> >
>
> I think it would be better if we can prepare the I/O balance patches
> on top of main patch and evaluate both approaches.  We can test both
> the approaches and integrate the one which turned out to be good.
>

Just to add something to testing both approaches.  I think we can
first come up with a way to compute the throttling vacuum does as
mentioned by me in one of the emails above [1] or in some other way.
I think Dilip is planning to give it a try and once we have that we
can evaluate both the patches.  Some of the tests I have in mind are:
a. All indexes have an equal amount of deleted data.
b. indexes have an uneven amount of deleted data.
c. try with mix of indexes (btree, gin, gist, hash, etc..) on a table.

Feel free to add more tests.

[1] - https://www.postgresql.org/message-id/CAA4eK1%2BPeiFLdTuwrE6CvbNdx80E-O%3DZxCuWB2maREKFD-RaCA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Thu, Oct 24, 2019 at 8:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Oct 24, 2019 at 3:21 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Fri, Oct 18, 2019 at 8:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > > >
> > > > > > > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Another point in this regard is that the user anyway has an option to
> > > > > > > > > turn off the cost-based vacuum.  By default, it is anyway disabled.
> > > > > > > > > So, if the user enables it we have to provide some sensible behavior.
> > > > > > > > > If we can't come up with anything, then, in the end, we might want to
> > > > > > > > > turn it off for a parallel vacuum and mention the same in docs, but I
> > > > > > > > > think we should try to come up with a solution for it.
> > > > > > > >
> > > > > > > > I finally got your point and now understood the need. And the idea I
> > > > > > > > proposed doesn't work fine.
> > > > > > > >
> > > > > > > > So you meant that all workers share the cost count and if a parallel
> > > > > > > > vacuum worker increase the cost and it reaches the limit, does the
> > > > > > > > only one worker sleep? Is that okay even though other parallel workers
> > > > > > > > are still running and then the sleep might not help?
> > > > > > > >
> > > > > >
> > > > > > Remember that the other running workers will also increase
> > > > > > VacuumCostBalance and whichever worker finds that it becomes greater
> > > > > > than VacuumCostLimit will reset its value and sleep.  So, won't this
> > > > > > make sure that overall throttling works the same?
> > > > > >
> > > > > > > I agree with this point.  There is a possibility that some of the
> > > > > > > workers who are doing heavy I/O continue to work and OTOH other
> > > > > > > workers who are doing very less I/O might become the victim and
> > > > > > > unnecessarily delay its operation.
> > > > > > >
> > > > > >
> > > > > > Sure, but will it impact the overall I/O?  I mean to say the rate
> > > > > > limit we want to provide for overall vacuum operation will still be
> > > > > > the same.  Also, isn't a similar thing happens now also where heap
> > > > > > might have done a major portion of I/O but soon after we start
> > > > > > vacuuming the index, we will hit the limit and will sleep.
> > > > >
> > > > > Actually, What I meant is that the worker who performing actual I/O
> > > > > might not go for the delay and another worker which has done only CPU
> > > > > operation might pay the penalty?  So basically the worker who is doing
> > > > > CPU intensive operation might go for the delay and pay the penalty and
> > > > > the worker who is performing actual I/O continues to work and do
> > > > > further I/O.  Do you think this is not a practical problem?
> > > > >
> > > >
> > > > I don't know.  Generally, we try to delay (if required) before
> > > > processing (read/write) one page which means it will happen for I/O
> > > > intensive operations, so I am not sure if the point you are making is
> > > > completely correct.
> > >
> > > Ok, I agree with the point that we are checking it only when we are
> > > doing the I/O operation.  But, we also need to consider that each I/O
> > > operations have a different weightage.  So even if we have a delay
> > > point at I/O operation there is a possibility that we might delay the
> > > worker which is just performing read buffer with page
> > > hit(VacuumCostPageHit).  But, the other worker who is actually
> > > dirtying the page(VacuumCostPageDirty = 20) continue the work and do
> > > more I/O.
> > >
> > > >
> > > > > Stepping back a bit,  OTOH, I think that we can not guarantee that the
> > > > > one worker who has done more I/O will continue to do further I/O and
> > > > > the one which has not done much I/O will not perform more I/O in
> > > > > future.  So it might not be too bad if we compute shared costs as you
> > > > > suggested above.
> > > > >
> > > >
> > > > I am thinking if we can write the patch for both the approaches (a.
> > > > compute shared costs and try to delay based on that, b. try to divide
> > > > the I/O cost among workers as described in the email above[1]) and do
> > > > some tests to see the behavior of throttling, that might help us in
> > > > deciding what is the best strategy to solve this problem, if any.
> > > > What do you think?
> > >
> > > I agree with this idea.  I can come up with a POC patch for approach
> > > (b).  Meanwhile, if someone is interested to quickly hack with the
> > > approach (a) then we can do some testing and compare.  Sawada-san,
> > > by any chance will you be interested to write POC with approach (a)?
> > > Otherwise, I will try to write it after finishing the first one
> > > (approach b).
> > >
> > I have come up with the POC for approach (a).
> >
> > The idea is
> > 1) Before launching the worker divide the current VacuumCostBalance
> > among workers so that workers start accumulating the balance from that
> > point.
> > 2) Also, divide the VacuumCostLimit among the workers.
> > 3) Once the worker are done with the index vacuum, send back the
> > remaining balance with the leader.
> > 4) The leader will sum all the balances and add that to its current
> > VacuumCostBalance.  And start accumulating its balance from this
> > point.
> >
> > I was trying to test how is the behaviour of the vacuum I/O limit, but
> > I could not find an easy way to test that so I just put the tracepoint
> > in the code and just checked that at what point we are giving the
> > delay.
> > I also printed the cost balance at various point to see that after how
> > much I/O accumulation we are hitting the delay.  Please feel free to
> > suggest a better way to test this.
> >
> > I have printed these logs for parallel vacuum patch (v30) vs v(30) +
> > patch for dividing i/o limit (attached with the mail)
> >
> > Note: Patch and the test results are attached.
> >
>
> Thank you!
>
> For approach (a) the basic idea I've come up with is that we have a
> shared balance value on DSM and each workers including the leader
> process add its local balance value to it in vacuum_delay_point, and
> then based on the shared value workers sleep. I'll submit that patch
> with other updates.
IMHO, if we add the local balance to the shared balance in
vacuum_delay_point and each worker is working with full limit then
there will be a problem right? because suppose VacuumCostLimit is 2000
then the first time each worker hit the vacuum_delay_point when their
local limit will be 2000 so in most cases, the first delay will be hit
when there gross I/O is 6000 (if there are 3 workers).

I think if we want to have the shared accounting then we must
accumulate the balance always in a shared variable so that as soon as
the gross limit hits the VacuumCostLimit, we can have the delay point.

Maybe we can do this
1. change VacuumCostBalance from integer to pg_atomic_uint32 *
2. In heap_parallel_vacuum_main function, make this point into a
shared memory location.  Basically, for the non-parallel case, it will
point to the process-specific global variable whereas in parallel case
it will point to a shared memory variable.
3. Now, I think in code (I think 5-6 occurrence) wherever we are using
VacuumCostBalance, change them to use atomic operations.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Fri, Oct 25, 2019 at 12:44 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Oct 24, 2019 at 8:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Oct 24, 2019 at 3:21 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > On Fri, Oct 18, 2019 at 8:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > > >
> > > > > > > On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Another point in this regard is that the user anyway has an option to
> > > > > > > > > > turn off the cost-based vacuum.  By default, it is anyway disabled.
> > > > > > > > > > So, if the user enables it we have to provide some sensible behavior.
> > > > > > > > > > If we can't come up with anything, then, in the end, we might want to
> > > > > > > > > > turn it off for a parallel vacuum and mention the same in docs, but I
> > > > > > > > > > think we should try to come up with a solution for it.
> > > > > > > > >
> > > > > > > > > I finally got your point and now understood the need. And the idea I
> > > > > > > > > proposed doesn't work fine.
> > > > > > > > >
> > > > > > > > > So you meant that all workers share the cost count and if a parallel
> > > > > > > > > vacuum worker increase the cost and it reaches the limit, does the
> > > > > > > > > only one worker sleep? Is that okay even though other parallel workers
> > > > > > > > > are still running and then the sleep might not help?
> > > > > > > > >
> > > > > > >
> > > > > > > Remember that the other running workers will also increase
> > > > > > > VacuumCostBalance and whichever worker finds that it becomes greater
> > > > > > > than VacuumCostLimit will reset its value and sleep.  So, won't this
> > > > > > > make sure that overall throttling works the same?
> > > > > > >
> > > > > > > > I agree with this point.  There is a possibility that some of the
> > > > > > > > workers who are doing heavy I/O continue to work and OTOH other
> > > > > > > > workers who are doing very less I/O might become the victim and
> > > > > > > > unnecessarily delay its operation.
> > > > > > > >
> > > > > > >
> > > > > > > Sure, but will it impact the overall I/O?  I mean to say the rate
> > > > > > > limit we want to provide for overall vacuum operation will still be
> > > > > > > the same.  Also, isn't a similar thing happens now also where heap
> > > > > > > might have done a major portion of I/O but soon after we start
> > > > > > > vacuuming the index, we will hit the limit and will sleep.
> > > > > >
> > > > > > Actually, What I meant is that the worker who performing actual I/O
> > > > > > might not go for the delay and another worker which has done only CPU
> > > > > > operation might pay the penalty?  So basically the worker who is doing
> > > > > > CPU intensive operation might go for the delay and pay the penalty and
> > > > > > the worker who is performing actual I/O continues to work and do
> > > > > > further I/O.  Do you think this is not a practical problem?
> > > > > >
> > > > >
> > > > > I don't know.  Generally, we try to delay (if required) before
> > > > > processing (read/write) one page which means it will happen for I/O
> > > > > intensive operations, so I am not sure if the point you are making is
> > > > > completely correct.
> > > >
> > > > Ok, I agree with the point that we are checking it only when we are
> > > > doing the I/O operation.  But, we also need to consider that each I/O
> > > > operations have a different weightage.  So even if we have a delay
> > > > point at I/O operation there is a possibility that we might delay the
> > > > worker which is just performing read buffer with page
> > > > hit(VacuumCostPageHit).  But, the other worker who is actually
> > > > dirtying the page(VacuumCostPageDirty = 20) continue the work and do
> > > > more I/O.
> > > >
> > > > >
> > > > > > Stepping back a bit,  OTOH, I think that we can not guarantee that the
> > > > > > one worker who has done more I/O will continue to do further I/O and
> > > > > > the one which has not done much I/O will not perform more I/O in
> > > > > > future.  So it might not be too bad if we compute shared costs as you
> > > > > > suggested above.
> > > > > >
> > > > >
> > > > > I am thinking if we can write the patch for both the approaches (a.
> > > > > compute shared costs and try to delay based on that, b. try to divide
> > > > > the I/O cost among workers as described in the email above[1]) and do
> > > > > some tests to see the behavior of throttling, that might help us in
> > > > > deciding what is the best strategy to solve this problem, if any.
> > > > > What do you think?
> > > >
> > > > I agree with this idea.  I can come up with a POC patch for approach
> > > > (b).  Meanwhile, if someone is interested to quickly hack with the
> > > > approach (a) then we can do some testing and compare.  Sawada-san,
> > > > by any chance will you be interested to write POC with approach (a)?
> > > > Otherwise, I will try to write it after finishing the first one
> > > > (approach b).
> > > >
> > > I have come up with the POC for approach (a).
> > >
> > > The idea is
> > > 1) Before launching the worker divide the current VacuumCostBalance
> > > among workers so that workers start accumulating the balance from that
> > > point.
> > > 2) Also, divide the VacuumCostLimit among the workers.
> > > 3) Once the worker are done with the index vacuum, send back the
> > > remaining balance with the leader.
> > > 4) The leader will sum all the balances and add that to its current
> > > VacuumCostBalance.  And start accumulating its balance from this
> > > point.
> > >
> > > I was trying to test how is the behaviour of the vacuum I/O limit, but
> > > I could not find an easy way to test that so I just put the tracepoint
> > > in the code and just checked that at what point we are giving the
> > > delay.
> > > I also printed the cost balance at various point to see that after how
> > > much I/O accumulation we are hitting the delay.  Please feel free to
> > > suggest a better way to test this.
> > >
> > > I have printed these logs for parallel vacuum patch (v30) vs v(30) +
> > > patch for dividing i/o limit (attached with the mail)
> > >
> > > Note: Patch and the test results are attached.
> > >
> >
> > Thank you!
> >
> > For approach (a) the basic idea I've come up with is that we have a
> > shared balance value on DSM and each workers including the leader
> > process add its local balance value to it in vacuum_delay_point, and
> > then based on the shared value workers sleep. I'll submit that patch
> > with other updates.
> IMHO, if we add the local balance to the shared balance in
> vacuum_delay_point and each worker is working with full limit then
> there will be a problem right? because suppose VacuumCostLimit is 2000
> then the first time each worker hit the vacuum_delay_point when their
> local limit will be 2000 so in most cases, the first delay will be hit
> when there gross I/O is 6000 (if there are 3 workers).

For more detail of my idea it is that the first worker who entered to
vacuum_delay_point adds its local value to shared value and reset the
local value to 0. And then the worker sleeps if it exceeds
VacuumCostLimit but before sleeping it can subtract VacuumCostLimit
from the shared value. Since vacuum_delay_point are typically called
per page processed I expect there will not such problem. Thoughts?

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Fri, Oct 25, 2019 at 10:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Oct 25, 2019 at 12:44 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Oct 24, 2019 at 8:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Thu, Oct 24, 2019 at 3:21 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > > On Fri, Oct 18, 2019 at 8:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > > >
> > > > > > > On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Another point in this regard is that the user anyway has an option to
> > > > > > > > > > > turn off the cost-based vacuum.  By default, it is anyway disabled.
> > > > > > > > > > > So, if the user enables it we have to provide some sensible behavior.
> > > > > > > > > > > If we can't come up with anything, then, in the end, we might want to
> > > > > > > > > > > turn it off for a parallel vacuum and mention the same in docs, but I
> > > > > > > > > > > think we should try to come up with a solution for it.
> > > > > > > > > >
> > > > > > > > > > I finally got your point and now understood the need. And the idea I
> > > > > > > > > > proposed doesn't work fine.
> > > > > > > > > >
> > > > > > > > > > So you meant that all workers share the cost count and if a parallel
> > > > > > > > > > vacuum worker increase the cost and it reaches the limit, does the
> > > > > > > > > > only one worker sleep? Is that okay even though other parallel workers
> > > > > > > > > > are still running and then the sleep might not help?
> > > > > > > > > >
> > > > > > > >
> > > > > > > > Remember that the other running workers will also increase
> > > > > > > > VacuumCostBalance and whichever worker finds that it becomes greater
> > > > > > > > than VacuumCostLimit will reset its value and sleep.  So, won't this
> > > > > > > > make sure that overall throttling works the same?
> > > > > > > >
> > > > > > > > > I agree with this point.  There is a possibility that some of the
> > > > > > > > > workers who are doing heavy I/O continue to work and OTOH other
> > > > > > > > > workers who are doing very less I/O might become the victim and
> > > > > > > > > unnecessarily delay its operation.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Sure, but will it impact the overall I/O?  I mean to say the rate
> > > > > > > > limit we want to provide for overall vacuum operation will still be
> > > > > > > > the same.  Also, isn't a similar thing happens now also where heap
> > > > > > > > might have done a major portion of I/O but soon after we start
> > > > > > > > vacuuming the index, we will hit the limit and will sleep.
> > > > > > >
> > > > > > > Actually, What I meant is that the worker who performing actual I/O
> > > > > > > might not go for the delay and another worker which has done only CPU
> > > > > > > operation might pay the penalty?  So basically the worker who is doing
> > > > > > > CPU intensive operation might go for the delay and pay the penalty and
> > > > > > > the worker who is performing actual I/O continues to work and do
> > > > > > > further I/O.  Do you think this is not a practical problem?
> > > > > > >
> > > > > >
> > > > > > I don't know.  Generally, we try to delay (if required) before
> > > > > > processing (read/write) one page which means it will happen for I/O
> > > > > > intensive operations, so I am not sure if the point you are making is
> > > > > > completely correct.
> > > > >
> > > > > Ok, I agree with the point that we are checking it only when we are
> > > > > doing the I/O operation.  But, we also need to consider that each I/O
> > > > > operations have a different weightage.  So even if we have a delay
> > > > > point at I/O operation there is a possibility that we might delay the
> > > > > worker which is just performing read buffer with page
> > > > > hit(VacuumCostPageHit).  But, the other worker who is actually
> > > > > dirtying the page(VacuumCostPageDirty = 20) continue the work and do
> > > > > more I/O.
> > > > >
> > > > > >
> > > > > > > Stepping back a bit,  OTOH, I think that we can not guarantee that the
> > > > > > > one worker who has done more I/O will continue to do further I/O and
> > > > > > > the one which has not done much I/O will not perform more I/O in
> > > > > > > future.  So it might not be too bad if we compute shared costs as you
> > > > > > > suggested above.
> > > > > > >
> > > > > >
> > > > > > I am thinking if we can write the patch for both the approaches (a.
> > > > > > compute shared costs and try to delay based on that, b. try to divide
> > > > > > the I/O cost among workers as described in the email above[1]) and do
> > > > > > some tests to see the behavior of throttling, that might help us in
> > > > > > deciding what is the best strategy to solve this problem, if any.
> > > > > > What do you think?
> > > > >
> > > > > I agree with this idea.  I can come up with a POC patch for approach
> > > > > (b).  Meanwhile, if someone is interested to quickly hack with the
> > > > > approach (a) then we can do some testing and compare.  Sawada-san,
> > > > > by any chance will you be interested to write POC with approach (a)?
> > > > > Otherwise, I will try to write it after finishing the first one
> > > > > (approach b).
> > > > >
> > > > I have come up with the POC for approach (a).
> > > >
> > > > The idea is
> > > > 1) Before launching the worker divide the current VacuumCostBalance
> > > > among workers so that workers start accumulating the balance from that
> > > > point.
> > > > 2) Also, divide the VacuumCostLimit among the workers.
> > > > 3) Once the worker are done with the index vacuum, send back the
> > > > remaining balance with the leader.
> > > > 4) The leader will sum all the balances and add that to its current
> > > > VacuumCostBalance.  And start accumulating its balance from this
> > > > point.
> > > >
> > > > I was trying to test how is the behaviour of the vacuum I/O limit, but
> > > > I could not find an easy way to test that so I just put the tracepoint
> > > > in the code and just checked that at what point we are giving the
> > > > delay.
> > > > I also printed the cost balance at various point to see that after how
> > > > much I/O accumulation we are hitting the delay.  Please feel free to
> > > > suggest a better way to test this.
> > > >
> > > > I have printed these logs for parallel vacuum patch (v30) vs v(30) +
> > > > patch for dividing i/o limit (attached with the mail)
> > > >
> > > > Note: Patch and the test results are attached.
> > > >
> > >
> > > Thank you!
> > >
> > > For approach (a) the basic idea I've come up with is that we have a
> > > shared balance value on DSM and each workers including the leader
> > > process add its local balance value to it in vacuum_delay_point, and
> > > then based on the shared value workers sleep. I'll submit that patch
> > > with other updates.
> > IMHO, if we add the local balance to the shared balance in
> > vacuum_delay_point and each worker is working with full limit then
> > there will be a problem right? because suppose VacuumCostLimit is 2000
> > then the first time each worker hit the vacuum_delay_point when their
> > local limit will be 2000 so in most cases, the first delay will be hit
> > when there gross I/O is 6000 (if there are 3 workers).
>
> For more detail of my idea it is that the first worker who entered to
> vacuum_delay_point adds its local value to shared value and reset the
> local value to 0. And then the worker sleeps if it exceeds
> VacuumCostLimit but before sleeping it can subtract VacuumCostLimit
> from the shared value. Since vacuum_delay_point are typically called
> per page processed I expect there will not such problem. Thoughts?

Oh right, I assumed that when the local balance is exceeding the
VacuumCostLimit that time you are adding it to the shared value but
you are adding it to to shared value every time in vacuum_delay_point.
So I think your idea is correct.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Fri, Oct 25, 2019 at 2:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Oct 25, 2019 at 10:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > For more detail of my idea it is that the first worker who entered to
> > vacuum_delay_point adds its local value to shared value and reset the
> > local value to 0. And then the worker sleeps if it exceeds
> > VacuumCostLimit but before sleeping it can subtract VacuumCostLimit
> > from the shared value. Since vacuum_delay_point are typically called
> > per page processed I expect there will not such problem. Thoughts?
>
> Oh right, I assumed that when the local balance is exceeding the
> VacuumCostLimit that time you are adding it to the shared value but
> you are adding it to to shared value every time in vacuum_delay_point.
> So I think your idea is correct.

I've attached the updated patch set.

First three patches add new variables and a callback to index AM.

Next two patches are the main part to support parallel vacuum. I've
incorporated all review comments I got so far. The memory layout of
variable-length index statistics might be complex a bit. It's similar
to the format of heap tuple header, having a null bitmap. And both the
size of index statistics and actual data for each indexes follows.

Last patch is a PoC patch that implements the shared vacuum cost
balance. For now it's separated but after testing both approaches it
will be merged to 0004 patch. I'll test both next week.

This patch set can be applied on top of the patch[1] that improves
gist index bulk-deletion. So canparallelvacuum of gist index is true.

[1] https://www.postgresql.org/message-id/CAFiTN-uQY%2BB%2BCLb8W3YYdb7XmB9hyYFXkAy3C7RY%3D-YSWRV1DA%40mail.gmail.com

Regards,

--
Masahiko Sawada

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Fri, Oct 25, 2019 at 9:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Oct 25, 2019 at 2:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Oct 25, 2019 at 10:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > For more detail of my idea it is that the first worker who entered to
> > > vacuum_delay_point adds its local value to shared value and reset the
> > > local value to 0. And then the worker sleeps if it exceeds
> > > VacuumCostLimit but before sleeping it can subtract VacuumCostLimit
> > > from the shared value. Since vacuum_delay_point are typically called
> > > per page processed I expect there will not such problem. Thoughts?
> >
> > Oh right, I assumed that when the local balance is exceeding the
> > VacuumCostLimit that time you are adding it to the shared value but
> > you are adding it to to shared value every time in vacuum_delay_point.
> > So I think your idea is correct.
>
> I've attached the updated patch set.
>
> First three patches add new variables and a callback to index AM.
>
> Next two patches are the main part to support parallel vacuum. I've
> incorporated all review comments I got so far. The memory layout of
> variable-length index statistics might be complex a bit. It's similar
> to the format of heap tuple header, having a null bitmap. And both the
> size of index statistics and actual data for each indexes follows.
>
> Last patch is a PoC patch that implements the shared vacuum cost
> balance. For now it's separated but after testing both approaches it
> will be merged to 0004 patch. I'll test both next week.
>
> This patch set can be applied on top of the patch[1] that improves
> gist index bulk-deletion. So canparallelvacuum of gist index is true.
>
> [1] https://www.postgresql.org/message-id/CAFiTN-uQY%2BB%2BCLb8W3YYdb7XmB9hyYFXkAy3C7RY%3D-YSWRV1DA%40mail.gmail.com
>
I haven't yet read the new set of the patch.  But, I have noticed one
thing.  That we are getting the size of the statistics using the AM
routine.  But, we are copying those statistics from local memory to
the shared memory directly using the memcpy.   Wouldn't it be a good
idea to have an AM specific routine to get it copied from the local
memory to the shared memory?  I am not sure it is worth it or not but
my thought behind this point is that it will give AM to have local
stats in any form ( like they can store a pointer in that ) but they
can serialize that while copying to shared stats.  And, later when
shared stats are passed back to the Am then it can deserialize in its
local form and use it.

+ * Since all vacuum workers write the bulk-deletion result at
+ * different slots we can write them without locking.
+ */
+ if (!shared_indstats->updated && stats[idx] != NULL)
+ {
+ memcpy(bulkdelete_res, stats[idx], shared_indstats->size);
+ shared_indstats->updated = true;
+
+ /*
+ * no longer need the locally allocated result and now
+ * stats[idx] points to the DSM segment.
+ */
+ pfree(stats[idx]);
+ stats[idx] = bulkdelete_res;
+ }

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Thu, Oct 24, 2019 at 4:33 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Oct 24, 2019 at 4:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > I am thinking if we can write the patch for both the approaches (a.
> > > > > compute shared costs and try to delay based on that, b. try to divide
> > > > > the I/O cost among workers as described in the email above[1]) and do
> > > > > some tests to see the behavior of throttling, that might help us in
> > > > > deciding what is the best strategy to solve this problem, if any.
> > > > > What do you think?
> > > >
> > > > I agree with this idea.  I can come up with a POC patch for approach
> > > > (b).  Meanwhile, if someone is interested to quickly hack with the
> > > > approach (a) then we can do some testing and compare.  Sawada-san,
> > > > by any chance will you be interested to write POC with approach (a)?
> > > > Otherwise, I will try to write it after finishing the first one
> > > > (approach b).
> > > >
> > > I have come up with the POC for approach (a).

> > Can we compute the overall throttling (sleep time) in the operation
> > separately for heap and index, then divide the index's sleep_time with
> > a number of workers and add it to heap's sleep time?  Then, it will be
> > a bit easier to compare the data between parallel and non-parallel
> > case.
I have come up with a patch to compute the total delay during the
vacuum.  So the idea of computing the total cost delay is

Total cost delay = Total dealy of heap scan + Total dealy of
index/worker;  Patch is attached for the same.

I have prepared this patch on the latest patch of the parallel
vacuum[1].  I have also rebased the patch for the approach [b] for
dividing the vacuum cost limit and done some testing for computing the
I/O throttling.  Attached patches 0001-POC-compute-total-cost-delay
and 0002-POC-divide-vacuum-cost-limit can be applied on top of
v31-0005-Add-paralell-P-option-to-vacuumdb-command.patch.  I haven't
rebased on top of v31-0006, because v31-0006 is implementing the I/O
throttling with one approach and 0002-POC-divide-vacuum-cost-limit is
doing the same with another approach.   But,
0001-POC-compute-total-cost-delay can be applied on top of v31-0006 as
well (just 1-2 lines conflict).

Testing:  I have performed 2 tests, one with the same size indexes and
second with the different size indexes and measured total I/O delay
with the attached patch.

Setup:
VacuumCostDelay=10ms
VacuumCostLimit=2000

Test1 (Same size index):
create table test(a int, b varchar, c varchar);
create index idx1 on test(a);
create index idx2 on test(b);
create index idx3 on test(c);
insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
generate_series(1,500000) as i;
delete from test where a < 200000;

                      Vacuum (Head)                   Parallel Vacuum
           Vacuum Cost Divide Patch
Total Delay        1784 (ms)                           1398(ms)
                 1938(ms)


Test2 (Variable size dead tuple in index)
create table test(a int, b varchar, c varchar);
create index idx1 on test(a);
create index idx2 on test(b) where a > 100000;
create index idx3 on test(c) where a > 150000;

insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
generate_series(1,500000) as i;
delete from test where a < 200000;

Vacuum (Head)                                   Parallel Vacuum
              Vacuum Cost Divide Patch
Total Delay 1438 (ms)                               1029(ms)
                   1529(ms)


Conclusion:
1. The tests prove that the total I/O delay is significantly less with
the parallel vacuum.
2. With the vacuum cost divide the problem is solved but the delay bit
more compared to the non-parallel version.  The reason could be the
problem discussed at[2], but it needs further investigation.

Next, I will test with the v31-0006 (shared vacuum cost) patch.  I
will also try to test different types of indexes.

[1] https://www.postgresql.org/message-id/CAD21AoBMo9dr_QmhT%3DdKh7fmiq7tpx%2ByLHR8nw9i5NZ-SgtaVg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAA4eK1%2BPeiFLdTuwrE6CvbNdx80E-O%3DZxCuWB2maREKFD-RaCA%40mail.gmail.com

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Sun, Oct 27, 2019 at 12:52 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Oct 25, 2019 at 9:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> >
> I haven't yet read the new set of the patch.  But, I have noticed one
> thing.  That we are getting the size of the statistics using the AM
> routine.  But, we are copying those statistics from local memory to
> the shared memory directly using the memcpy.   Wouldn't it be a good
> idea to have an AM specific routine to get it copied from the local
> memory to the shared memory?  I am not sure it is worth it or not but
> my thought behind this point is that it will give AM to have local
> stats in any form ( like they can store a pointer in that ) but they
> can serialize that while copying to shared stats.  And, later when
> shared stats are passed back to the Am then it can deserialize in its
> local form and use it.
>

You have a point, but after changing the gist index, we don't have any
current usage for indexes that need something like that. So, on one
side there is some value in having an API to copy the stats, but on
the other side without having clear usage of an API, it might not be
good to expose a new API for the same.   I think we can expose such an
API in the future if there is a need for the same.  Do you or anyone
know of any external IndexAM that has such a need?

Few minor comments while glancing through the latest patchset.

1. I think you can merge 0001*, 0002*, 0003* patch into one patch as
all three expose new variable/function from IndexAmRoutine.

2.
+prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
+{
+ char *p = (char *) GetSharedIndStats(lvshared);
+ int vac_work_mem = IsAutoVacuumWorkerProcess() &&
+ autovacuum_work_mem != -1 ?
+ autovacuum_work_mem : maintenance_work_mem;

I think this function won't be called from AutoVacuumWorkerProcess at
least not as of now, so isn't it a better idea to have an Assert for
it?

3.
+void
+heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)

This function is for performing a parallel operation on the index, so
why to start with heap?  It is better to name it as
index_parallel_vacuum_main or simply parallel_vacuum_main.

4.
/* useindex = true means two-pass strategy; false means one-pass */
@@ -128,17 +280,12 @@ typedef struct LVRelStats
  BlockNumber pages_removed;
  double tuples_deleted;
  BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
- /* List of TIDs of tuples we intend to delete */
- /* NB: this list is ordered by TID address */
- int num_dead_tuples; /* current # of entries */
- int max_dead_tuples; /* # slots allocated in array */
- ItemPointer dead_tuples; /* array of ItemPointerData */
+ LVDeadTuples *dead_tuples;
  int num_index_scans;
  TransactionId latestRemovedXid;
  bool lock_waiter_detected;
 } LVRelStats;

-
 /* A few variables that don't seem worth passing around as parameters */
 static int elevel = -1;

It seems like a spurious line removal.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Mon, Oct 28, 2019 at 12:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sun, Oct 27, 2019 at 12:52 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Oct 25, 2019 at 9:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > >
> > I haven't yet read the new set of the patch.  But, I have noticed one
> > thing.  That we are getting the size of the statistics using the AM
> > routine.  But, we are copying those statistics from local memory to
> > the shared memory directly using the memcpy.   Wouldn't it be a good
> > idea to have an AM specific routine to get it copied from the local
> > memory to the shared memory?  I am not sure it is worth it or not but
> > my thought behind this point is that it will give AM to have local
> > stats in any form ( like they can store a pointer in that ) but they
> > can serialize that while copying to shared stats.  And, later when
> > shared stats are passed back to the Am then it can deserialize in its
> > local form and use it.
> >
>
> You have a point, but after changing the gist index, we don't have any
> current usage for indexes that need something like that. So, on one
> side there is some value in having an API to copy the stats, but on
> the other side without having clear usage of an API, it might not be
> good to expose a new API for the same.   I think we can expose such an
> API in the future if there is a need for the same.
I agree with the point.  But, the current patch exposes an API for
estimating the size for the statistics.  So IMHO, either we expose
both APIs for estimating the size of the stats and copy the stats or
none.  Am I missing something here?

 Do you or anyone
> know of any external IndexAM that has such a need?
>
> Few minor comments while glancing through the latest patchset.
>
> 1. I think you can merge 0001*, 0002*, 0003* patch into one patch as
> all three expose new variable/function from IndexAmRoutine.
>
> 2.
> +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
> +{
> + char *p = (char *) GetSharedIndStats(lvshared);
> + int vac_work_mem = IsAutoVacuumWorkerProcess() &&
> + autovacuum_work_mem != -1 ?
> + autovacuum_work_mem : maintenance_work_mem;
>
> I think this function won't be called from AutoVacuumWorkerProcess at
> least not as of now, so isn't it a better idea to have an Assert for
> it?
>
> 3.
> +void
> +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
>
> This function is for performing a parallel operation on the index, so
> why to start with heap?  It is better to name it as
> index_parallel_vacuum_main or simply parallel_vacuum_main.
>
> 4.
> /* useindex = true means two-pass strategy; false means one-pass */
> @@ -128,17 +280,12 @@ typedef struct LVRelStats
>   BlockNumber pages_removed;
>   double tuples_deleted;
>   BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
> - /* List of TIDs of tuples we intend to delete */
> - /* NB: this list is ordered by TID address */
> - int num_dead_tuples; /* current # of entries */
> - int max_dead_tuples; /* # slots allocated in array */
> - ItemPointer dead_tuples; /* array of ItemPointerData */
> + LVDeadTuples *dead_tuples;
>   int num_index_scans;
>   TransactionId latestRemovedXid;
>   bool lock_waiter_detected;
>  } LVRelStats;
>
> -
>  /* A few variables that don't seem worth passing around as parameters */
>  static int elevel = -1;
>
> It seems like a spurious line removal.
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Fri, Oct 25, 2019 at 9:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Oct 25, 2019 at 2:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Oct 25, 2019 at 10:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > For more detail of my idea it is that the first worker who entered to
> > > vacuum_delay_point adds its local value to shared value and reset the
> > > local value to 0. And then the worker sleeps if it exceeds
> > > VacuumCostLimit but before sleeping it can subtract VacuumCostLimit
> > > from the shared value. Since vacuum_delay_point are typically called
> > > per page processed I expect there will not such problem. Thoughts?
> >
> > Oh right, I assumed that when the local balance is exceeding the
> > VacuumCostLimit that time you are adding it to the shared value but
> > you are adding it to to shared value every time in vacuum_delay_point.
> > So I think your idea is correct.
>
> I've attached the updated patch set.
>
> First three patches add new variables and a callback to index AM.
>
> Next two patches are the main part to support parallel vacuum. I've
> incorporated all review comments I got so far. The memory layout of
> variable-length index statistics might be complex a bit. It's similar
> to the format of heap tuple header, having a null bitmap. And both the
> size of index statistics and actual data for each indexes follows.
>
> Last patch is a PoC patch that implements the shared vacuum cost
> balance. For now it's separated but after testing both approaches it
> will be merged to 0004 patch. I'll test both next week.
>
> This patch set can be applied on top of the patch[1] that improves
> gist index bulk-deletion. So canparallelvacuum of gist index is true.
>

+ /* Get the space for IndexBulkDeleteResult */
+ bulkdelete_res = GetIndexBulkDeleteResult(shared_indstats);
+
+ /*
+ * Update the pointer to the corresponding bulk-deletion result
+ * if someone has already updated it.
+ */
+ if (shared_indstats->updated && stats[idx] == NULL)
+ stats[idx] = bulkdelete_res;
+

I have a doubt in this hunk,  I do not understand when this condition
will be hit?  Because whenever we are setting shared_indstats->updated
to true at the same time we are setting stats[idx] to shared stat.  So
I am not sure in what case the shared_indstats->updated will be true
but stats[idx] is still pointing to NULL?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Mon, Oct 28, 2019 at 6:08 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Oct 25, 2019 at 9:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Fri, Oct 25, 2019 at 2:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Fri, Oct 25, 2019 at 10:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > For more detail of my idea it is that the first worker who entered to
> > > > vacuum_delay_point adds its local value to shared value and reset the
> > > > local value to 0. And then the worker sleeps if it exceeds
> > > > VacuumCostLimit but before sleeping it can subtract VacuumCostLimit
> > > > from the shared value. Since vacuum_delay_point are typically called
> > > > per page processed I expect there will not such problem. Thoughts?
> > >
> > > Oh right, I assumed that when the local balance is exceeding the
> > > VacuumCostLimit that time you are adding it to the shared value but
> > > you are adding it to to shared value every time in vacuum_delay_point.
> > > So I think your idea is correct.
> >
> > I've attached the updated patch set.
> >
> > First three patches add new variables and a callback to index AM.
> >
> > Next two patches are the main part to support parallel vacuum. I've
> > incorporated all review comments I got so far. The memory layout of
> > variable-length index statistics might be complex a bit. It's similar
> > to the format of heap tuple header, having a null bitmap. And both the
> > size of index statistics and actual data for each indexes follows.
> >
> > Last patch is a PoC patch that implements the shared vacuum cost
> > balance. For now it's separated but after testing both approaches it
> > will be merged to 0004 patch. I'll test both next week.
> >
> > This patch set can be applied on top of the patch[1] that improves
> > gist index bulk-deletion. So canparallelvacuum of gist index is true.
> >
>
> + /* Get the space for IndexBulkDeleteResult */
> + bulkdelete_res = GetIndexBulkDeleteResult(shared_indstats);
> +
> + /*
> + * Update the pointer to the corresponding bulk-deletion result
> + * if someone has already updated it.
> + */
> + if (shared_indstats->updated && stats[idx] == NULL)
> + stats[idx] = bulkdelete_res;
> +
>
> I have a doubt in this hunk,  I do not understand when this condition
> will be hit?  Because whenever we are setting shared_indstats->updated
> to true at the same time we are setting stats[idx] to shared stat.  So
> I am not sure in what case the shared_indstats->updated will be true
> but stats[idx] is still pointing to NULL?
>

I think it can be true in the case where one parallel vacuum worker
vacuums the index that was vacuumed by other workers in previous index
vacuum cycle. Suppose that worker-A and worker-B vacuumed index-A and
index-B respectively. After that worker-A vacuum index-B in the next
index vacuum cycle. In this case, shared_indstats->updated is true
because worker-B already vacuumed in the previous vacuum cycle. On the
other hand stats[idx] on worker-A is NULL because it's first time for
worker-A to vacuum index-B. Therefore worker-A updates its stats[idx]
to the bulk-deletion result on DSM in order to pass it to the index
AM.

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Tue, Oct 29, 2019 at 10:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Oct 28, 2019 at 6:08 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Oct 25, 2019 at 9:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Fri, Oct 25, 2019 at 2:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Fri, Oct 25, 2019 at 10:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > >
> > > > > For more detail of my idea it is that the first worker who entered to
> > > > > vacuum_delay_point adds its local value to shared value and reset the
> > > > > local value to 0. And then the worker sleeps if it exceeds
> > > > > VacuumCostLimit but before sleeping it can subtract VacuumCostLimit
> > > > > from the shared value. Since vacuum_delay_point are typically called
> > > > > per page processed I expect there will not such problem. Thoughts?
> > > >
> > > > Oh right, I assumed that when the local balance is exceeding the
> > > > VacuumCostLimit that time you are adding it to the shared value but
> > > > you are adding it to to shared value every time in vacuum_delay_point.
> > > > So I think your idea is correct.
> > >
> > > I've attached the updated patch set.
> > >
> > > First three patches add new variables and a callback to index AM.
> > >
> > > Next two patches are the main part to support parallel vacuum. I've
> > > incorporated all review comments I got so far. The memory layout of
> > > variable-length index statistics might be complex a bit. It's similar
> > > to the format of heap tuple header, having a null bitmap. And both the
> > > size of index statistics and actual data for each indexes follows.
> > >
> > > Last patch is a PoC patch that implements the shared vacuum cost
> > > balance. For now it's separated but after testing both approaches it
> > > will be merged to 0004 patch. I'll test both next week.
> > >
> > > This patch set can be applied on top of the patch[1] that improves
> > > gist index bulk-deletion. So canparallelvacuum of gist index is true.
> > >
> >
> > + /* Get the space for IndexBulkDeleteResult */
> > + bulkdelete_res = GetIndexBulkDeleteResult(shared_indstats);
> > +
> > + /*
> > + * Update the pointer to the corresponding bulk-deletion result
> > + * if someone has already updated it.
> > + */
> > + if (shared_indstats->updated && stats[idx] == NULL)
> > + stats[idx] = bulkdelete_res;
> > +
> >
> > I have a doubt in this hunk,  I do not understand when this condition
> > will be hit?  Because whenever we are setting shared_indstats->updated
> > to true at the same time we are setting stats[idx] to shared stat.  So
> > I am not sure in what case the shared_indstats->updated will be true
> > but stats[idx] is still pointing to NULL?
> >
>
> I think it can be true in the case where one parallel vacuum worker
> vacuums the index that was vacuumed by other workers in previous index
> vacuum cycle. Suppose that worker-A and worker-B vacuumed index-A and
> index-B respectively. After that worker-A vacuum index-B in the next
> index vacuum cycle. In this case, shared_indstats->updated is true
> because worker-B already vacuumed in the previous vacuum cycle. On the
> other hand stats[idx] on worker-A is NULL because it's first time for
> worker-A to vacuum index-B. Therefore worker-A updates its stats[idx]
> to the bulk-deletion result on DSM in order to pass it to the index
> AM.
Okay, that makes sense.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Mon, Oct 28, 2019 at 2:13 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Oct 24, 2019 at 4:33 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Oct 24, 2019 at 4:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > > I am thinking if we can write the patch for both the approaches (a.
> > > > > > compute shared costs and try to delay based on that, b. try to divide
> > > > > > the I/O cost among workers as described in the email above[1]) and do
> > > > > > some tests to see the behavior of throttling, that might help us in
> > > > > > deciding what is the best strategy to solve this problem, if any.
> > > > > > What do you think?
> > > > >
> > > > > I agree with this idea.  I can come up with a POC patch for approach
> > > > > (b).  Meanwhile, if someone is interested to quickly hack with the
> > > > > approach (a) then we can do some testing and compare.  Sawada-san,
> > > > > by any chance will you be interested to write POC with approach (a)?
> > > > > Otherwise, I will try to write it after finishing the first one
> > > > > (approach b).
> > > > >
> > > > I have come up with the POC for approach (a).
>
> > > Can we compute the overall throttling (sleep time) in the operation
> > > separately for heap and index, then divide the index's sleep_time with
> > > a number of workers and add it to heap's sleep time?  Then, it will be
> > > a bit easier to compare the data between parallel and non-parallel
> > > case.
> I have come up with a patch to compute the total delay during the
> vacuum.  So the idea of computing the total cost delay is
>
> Total cost delay = Total dealy of heap scan + Total dealy of
> index/worker;  Patch is attached for the same.
>
> I have prepared this patch on the latest patch of the parallel
> vacuum[1].  I have also rebased the patch for the approach [b] for
> dividing the vacuum cost limit and done some testing for computing the
> I/O throttling.  Attached patches 0001-POC-compute-total-cost-delay
> and 0002-POC-divide-vacuum-cost-limit can be applied on top of
> v31-0005-Add-paralell-P-option-to-vacuumdb-command.patch.  I haven't
> rebased on top of v31-0006, because v31-0006 is implementing the I/O
> throttling with one approach and 0002-POC-divide-vacuum-cost-limit is
> doing the same with another approach.   But,
> 0001-POC-compute-total-cost-delay can be applied on top of v31-0006 as
> well (just 1-2 lines conflict).
>
> Testing:  I have performed 2 tests, one with the same size indexes and
> second with the different size indexes and measured total I/O delay
> with the attached patch.
>
> Setup:
> VacuumCostDelay=10ms
> VacuumCostLimit=2000
>
> Test1 (Same size index):
> create table test(a int, b varchar, c varchar);
> create index idx1 on test(a);
> create index idx2 on test(b);
> create index idx3 on test(c);
> insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> generate_series(1,500000) as i;
> delete from test where a < 200000;
>
>                       Vacuum (Head)                   Parallel Vacuum
>            Vacuum Cost Divide Patch
> Total Delay        1784 (ms)                           1398(ms)
>                  1938(ms)
>
>
> Test2 (Variable size dead tuple in index)
> create table test(a int, b varchar, c varchar);
> create index idx1 on test(a);
> create index idx2 on test(b) where a > 100000;
> create index idx3 on test(c) where a > 150000;
>
> insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> generate_series(1,500000) as i;
> delete from test where a < 200000;
>
> Vacuum (Head)                                   Parallel Vacuum
>               Vacuum Cost Divide Patch
> Total Delay 1438 (ms)                               1029(ms)
>                    1529(ms)
>
>
> Conclusion:
> 1. The tests prove that the total I/O delay is significantly less with
> the parallel vacuum.
> 2. With the vacuum cost divide the problem is solved but the delay bit
> more compared to the non-parallel version.  The reason could be the
> problem discussed at[2], but it needs further investigation.
>
> Next, I will test with the v31-0006 (shared vacuum cost) patch.  I
> will also try to test different types of indexes.
>

Thank you for testing!

I realized that v31-0006 patch doesn't work fine so I've attached the
updated version patch that also incorporated some comments I got so
far. Sorry for the inconvenience. I'll apply your 0001 patch and also
test the total delay time.

Regards,

--
Masahiko Sawada

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, Oct 29, 2019 at 4:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Oct 28, 2019 at 2:13 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Oct 24, 2019 at 4:33 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Thu, Oct 24, 2019 at 4:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > >
> > > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > > >
> > > > > > > I am thinking if we can write the patch for both the approaches (a.
> > > > > > > compute shared costs and try to delay based on that, b. try to divide
> > > > > > > the I/O cost among workers as described in the email above[1]) and do
> > > > > > > some tests to see the behavior of throttling, that might help us in
> > > > > > > deciding what is the best strategy to solve this problem, if any.
> > > > > > > What do you think?
> > > > > >
> > > > > > I agree with this idea.  I can come up with a POC patch for approach
> > > > > > (b).  Meanwhile, if someone is interested to quickly hack with the
> > > > > > approach (a) then we can do some testing and compare.  Sawada-san,
> > > > > > by any chance will you be interested to write POC with approach (a)?
> > > > > > Otherwise, I will try to write it after finishing the first one
> > > > > > (approach b).
> > > > > >
> > > > > I have come up with the POC for approach (a).
> >
> > > > Can we compute the overall throttling (sleep time) in the operation
> > > > separately for heap and index, then divide the index's sleep_time with
> > > > a number of workers and add it to heap's sleep time?  Then, it will be
> > > > a bit easier to compare the data between parallel and non-parallel
> > > > case.
> > I have come up with a patch to compute the total delay during the
> > vacuum.  So the idea of computing the total cost delay is
> >
> > Total cost delay = Total dealy of heap scan + Total dealy of
> > index/worker;  Patch is attached for the same.
> >
> > I have prepared this patch on the latest patch of the parallel
> > vacuum[1].  I have also rebased the patch for the approach [b] for
> > dividing the vacuum cost limit and done some testing for computing the
> > I/O throttling.  Attached patches 0001-POC-compute-total-cost-delay
> > and 0002-POC-divide-vacuum-cost-limit can be applied on top of
> > v31-0005-Add-paralell-P-option-to-vacuumdb-command.patch.  I haven't
> > rebased on top of v31-0006, because v31-0006 is implementing the I/O
> > throttling with one approach and 0002-POC-divide-vacuum-cost-limit is
> > doing the same with another approach.   But,
> > 0001-POC-compute-total-cost-delay can be applied on top of v31-0006 as
> > well (just 1-2 lines conflict).
> >
> > Testing:  I have performed 2 tests, one with the same size indexes and
> > second with the different size indexes and measured total I/O delay
> > with the attached patch.
> >
> > Setup:
> > VacuumCostDelay=10ms
> > VacuumCostLimit=2000
> >
> > Test1 (Same size index):
> > create table test(a int, b varchar, c varchar);
> > create index idx1 on test(a);
> > create index idx2 on test(b);
> > create index idx3 on test(c);
> > insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> > generate_series(1,500000) as i;
> > delete from test where a < 200000;
> >
> >                       Vacuum (Head)                   Parallel Vacuum
> >            Vacuum Cost Divide Patch
> > Total Delay        1784 (ms)                           1398(ms)
> >                  1938(ms)
> >
> >
> > Test2 (Variable size dead tuple in index)
> > create table test(a int, b varchar, c varchar);
> > create index idx1 on test(a);
> > create index idx2 on test(b) where a > 100000;
> > create index idx3 on test(c) where a > 150000;
> >
> > insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> > generate_series(1,500000) as i;
> > delete from test where a < 200000;
> >
> > Vacuum (Head)                                   Parallel Vacuum
> >               Vacuum Cost Divide Patch
> > Total Delay 1438 (ms)                               1029(ms)
> >                    1529(ms)
> >
> >
> > Conclusion:
> > 1. The tests prove that the total I/O delay is significantly less with
> > the parallel vacuum.
> > 2. With the vacuum cost divide the problem is solved but the delay bit
> > more compared to the non-parallel version.  The reason could be the
> > problem discussed at[2], but it needs further investigation.
> >
> > Next, I will test with the v31-0006 (shared vacuum cost) patch.  I
> > will also try to test different types of indexes.
> >
>
> Thank you for testing!
>
> I realized that v31-0006 patch doesn't work fine so I've attached the
> updated version patch that also incorporated some comments I got so
> far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> test the total delay time.
>

FWIW I'd like to share the results of total delay time evaluation of
approach (a) (shared cost balance). I used the same workloads that
Dilip shared and set vacuum_cost_delay to 10. The results of two test
cases are here:

* Test1
normal      : 12656 ms (hit 50594, miss 5700, dirty 7258, total 63552)
2 workers : 17149 ms (hit 47673, miss 8647, dirty 9157, total 65477)
1 worker   : 19498 ms (hit 45954, miss 10340, dirty 10517, total 66811)

* Test2
normal      : 1530 ms (hit 30645, miss 2, dirty 3, total 30650)
2 workers : 1538 ms (hit 30645, miss 2, dirty 3, total 30650)
1 worker   : 1538 ms (hit 30645, miss 2, dirty 3, total 30650)

'hit', 'miss' and 'dirty' are the total numbers of buffer hits, buffer
misses and flushing dirty buffer, respectively. 'total' is the sum of
these three values.

In this evaluation I expect that parallel vacuum cases delay time as
much as the time of normal vacuum because the total number of pages to
vacuum is the same and we have the shared cost balance value and each
workers decide to sleep based on that value. According to the above
Test1 results, we can see that there is a big difference in the total
delay time among  these cases (normal vacuum case is shortest), but
the cause of this is that parallel vacuum had to to flush more dirty
pages. Actually after increased shared_buffer I got expected results:

* Test1 (after increased shared_buffers)
normal      : 2807 ms (hit 56295, miss 2, dirty 3, total 56300)
2 workers : 2840 ms (hit 56295, miss 2, dirty 3, total 56300)
1 worker   : 2841 ms (hit 56295, miss 2, dirty 3, total 56300)

I updated the patch that computes the total cost delay shared by
Dilip[1] so that it collects the number of buffer hits and so on, and
have attached it. It can be applied on top of my latest patch set[1].

[1] https://www.postgresql.org/message-id/CAFiTN-thU-z8f04jO7xGMu5yUUpTpsBTvBrFW6EhRf-jGvEz%3Dg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAD21AoAqT17QwKJ_sWOqRxNvg66wMw1oZZzf9Rt-E-zD%2BXOh_Q%40mail.gmail.com

Regards,

--
Masahiko Sawada

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Tue, Oct 29, 2019 at 1:59 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Oct 29, 2019 at 4:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Oct 28, 2019 at 2:13 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Thu, Oct 24, 2019 at 4:33 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Thu, Oct 24, 2019 at 4:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > >
> > > > > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > > >
> > > > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > > > >
> > > > > > > > I am thinking if we can write the patch for both the approaches (a.
> > > > > > > > compute shared costs and try to delay based on that, b. try to divide
> > > > > > > > the I/O cost among workers as described in the email above[1]) and do
> > > > > > > > some tests to see the behavior of throttling, that might help us in
> > > > > > > > deciding what is the best strategy to solve this problem, if any.
> > > > > > > > What do you think?
> > > > > > >
> > > > > > > I agree with this idea.  I can come up with a POC patch for approach
> > > > > > > (b).  Meanwhile, if someone is interested to quickly hack with the
> > > > > > > approach (a) then we can do some testing and compare.  Sawada-san,
> > > > > > > by any chance will you be interested to write POC with approach (a)?
> > > > > > > Otherwise, I will try to write it after finishing the first one
> > > > > > > (approach b).
> > > > > > >
> > > > > > I have come up with the POC for approach (a).
> > >
> > > > > Can we compute the overall throttling (sleep time) in the operation
> > > > > separately for heap and index, then divide the index's sleep_time with
> > > > > a number of workers and add it to heap's sleep time?  Then, it will be
> > > > > a bit easier to compare the data between parallel and non-parallel
> > > > > case.
> > > I have come up with a patch to compute the total delay during the
> > > vacuum.  So the idea of computing the total cost delay is
> > >
> > > Total cost delay = Total dealy of heap scan + Total dealy of
> > > index/worker;  Patch is attached for the same.
> > >
> > > I have prepared this patch on the latest patch of the parallel
> > > vacuum[1].  I have also rebased the patch for the approach [b] for
> > > dividing the vacuum cost limit and done some testing for computing the
> > > I/O throttling.  Attached patches 0001-POC-compute-total-cost-delay
> > > and 0002-POC-divide-vacuum-cost-limit can be applied on top of
> > > v31-0005-Add-paralell-P-option-to-vacuumdb-command.patch.  I haven't
> > > rebased on top of v31-0006, because v31-0006 is implementing the I/O
> > > throttling with one approach and 0002-POC-divide-vacuum-cost-limit is
> > > doing the same with another approach.   But,
> > > 0001-POC-compute-total-cost-delay can be applied on top of v31-0006 as
> > > well (just 1-2 lines conflict).
> > >
> > > Testing:  I have performed 2 tests, one with the same size indexes and
> > > second with the different size indexes and measured total I/O delay
> > > with the attached patch.
> > >
> > > Setup:
> > > VacuumCostDelay=10ms
> > > VacuumCostLimit=2000
> > >
> > > Test1 (Same size index):
> > > create table test(a int, b varchar, c varchar);
> > > create index idx1 on test(a);
> > > create index idx2 on test(b);
> > > create index idx3 on test(c);
> > > insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> > > generate_series(1,500000) as i;
> > > delete from test where a < 200000;
> > >
> > >                       Vacuum (Head)                   Parallel Vacuum
> > >            Vacuum Cost Divide Patch
> > > Total Delay        1784 (ms)                           1398(ms)
> > >                  1938(ms)
> > >
> > >
> > > Test2 (Variable size dead tuple in index)
> > > create table test(a int, b varchar, c varchar);
> > > create index idx1 on test(a);
> > > create index idx2 on test(b) where a > 100000;
> > > create index idx3 on test(c) where a > 150000;
> > >
> > > insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> > > generate_series(1,500000) as i;
> > > delete from test where a < 200000;
> > >
> > > Vacuum (Head)                                   Parallel Vacuum
> > >               Vacuum Cost Divide Patch
> > > Total Delay 1438 (ms)                               1029(ms)
> > >                    1529(ms)
> > >
> > >
> > > Conclusion:
> > > 1. The tests prove that the total I/O delay is significantly less with
> > > the parallel vacuum.
> > > 2. With the vacuum cost divide the problem is solved but the delay bit
> > > more compared to the non-parallel version.  The reason could be the
> > > problem discussed at[2], but it needs further investigation.
> > >
> > > Next, I will test with the v31-0006 (shared vacuum cost) patch.  I
> > > will also try to test different types of indexes.
> > >
> >
> > Thank you for testing!
> >
> > I realized that v31-0006 patch doesn't work fine so I've attached the
> > updated version patch that also incorporated some comments I got so
> > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > test the total delay time.
> >
>
> FWIW I'd like to share the results of total delay time evaluation of
> approach (a) (shared cost balance). I used the same workloads that
> Dilip shared and set vacuum_cost_delay to 10. The results of two test
> cases are here:
>
> * Test1
> normal      : 12656 ms (hit 50594, miss 5700, dirty 7258, total 63552)
> 2 workers : 17149 ms (hit 47673, miss 8647, dirty 9157, total 65477)
> 1 worker   : 19498 ms (hit 45954, miss 10340, dirty 10517, total 66811)
>
> * Test2
> normal      : 1530 ms (hit 30645, miss 2, dirty 3, total 30650)
> 2 workers : 1538 ms (hit 30645, miss 2, dirty 3, total 30650)
> 1 worker   : 1538 ms (hit 30645, miss 2, dirty 3, total 30650)
>
> 'hit', 'miss' and 'dirty' are the total numbers of buffer hits, buffer
> misses and flushing dirty buffer, respectively. 'total' is the sum of
> these three values.
>
> In this evaluation I expect that parallel vacuum cases delay time as
> much as the time of normal vacuum because the total number of pages to
> vacuum is the same and we have the shared cost balance value and each
> workers decide to sleep based on that value. According to the above
> Test1 results, we can see that there is a big difference in the total
> delay time among  these cases (normal vacuum case is shortest), but
> the cause of this is that parallel vacuum had to to flush more dirty
> pages. Actually after increased shared_buffer I got expected results:
>
> * Test1 (after increased shared_buffers)
> normal      : 2807 ms (hit 56295, miss 2, dirty 3, total 56300)
> 2 workers : 2840 ms (hit 56295, miss 2, dirty 3, total 56300)
> 1 worker   : 2841 ms (hit 56295, miss 2, dirty 3, total 56300)
>
> I updated the patch that computes the total cost delay shared by
> Dilip[1] so that it collects the number of buffer hits and so on, and
> have attached it. It can be applied on top of my latest patch set[1].

Thanks, Sawada-san.  In my next test, I will use this updated patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Tue, Oct 29, 2019 at 3:11 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Oct 29, 2019 at 1:59 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Oct 29, 2019 at 4:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Mon, Oct 28, 2019 at 2:13 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Thu, Oct 24, 2019 at 4:33 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > On Thu, Oct 24, 2019 at 4:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > > >
> > > > > > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > I am thinking if we can write the patch for both the approaches (a.
> > > > > > > > > compute shared costs and try to delay based on that, b. try to divide
> > > > > > > > > the I/O cost among workers as described in the email above[1]) and do
> > > > > > > > > some tests to see the behavior of throttling, that might help us in
> > > > > > > > > deciding what is the best strategy to solve this problem, if any.
> > > > > > > > > What do you think?
> > > > > > > >
> > > > > > > > I agree with this idea.  I can come up with a POC patch for approach
> > > > > > > > (b).  Meanwhile, if someone is interested to quickly hack with the
> > > > > > > > approach (a) then we can do some testing and compare.  Sawada-san,
> > > > > > > > by any chance will you be interested to write POC with approach (a)?
> > > > > > > > Otherwise, I will try to write it after finishing the first one
> > > > > > > > (approach b).
> > > > > > > >
> > > > > > > I have come up with the POC for approach (a).
> > > >
> > > > > > Can we compute the overall throttling (sleep time) in the operation
> > > > > > separately for heap and index, then divide the index's sleep_time with
> > > > > > a number of workers and add it to heap's sleep time?  Then, it will be
> > > > > > a bit easier to compare the data between parallel and non-parallel
> > > > > > case.
> > > > I have come up with a patch to compute the total delay during the
> > > > vacuum.  So the idea of computing the total cost delay is
> > > >
> > > > Total cost delay = Total dealy of heap scan + Total dealy of
> > > > index/worker;  Patch is attached for the same.
> > > >
> > > > I have prepared this patch on the latest patch of the parallel
> > > > vacuum[1].  I have also rebased the patch for the approach [b] for
> > > > dividing the vacuum cost limit and done some testing for computing the
> > > > I/O throttling.  Attached patches 0001-POC-compute-total-cost-delay
> > > > and 0002-POC-divide-vacuum-cost-limit can be applied on top of
> > > > v31-0005-Add-paralell-P-option-to-vacuumdb-command.patch.  I haven't
> > > > rebased on top of v31-0006, because v31-0006 is implementing the I/O
> > > > throttling with one approach and 0002-POC-divide-vacuum-cost-limit is
> > > > doing the same with another approach.   But,
> > > > 0001-POC-compute-total-cost-delay can be applied on top of v31-0006 as
> > > > well (just 1-2 lines conflict).
> > > >
> > > > Testing:  I have performed 2 tests, one with the same size indexes and
> > > > second with the different size indexes and measured total I/O delay
> > > > with the attached patch.
> > > >
> > > > Setup:
> > > > VacuumCostDelay=10ms
> > > > VacuumCostLimit=2000
> > > >
> > > > Test1 (Same size index):
> > > > create table test(a int, b varchar, c varchar);
> > > > create index idx1 on test(a);
> > > > create index idx2 on test(b);
> > > > create index idx3 on test(c);
> > > > insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> > > > generate_series(1,500000) as i;
> > > > delete from test where a < 200000;
> > > >
> > > >                       Vacuum (Head)                   Parallel Vacuum
> > > >            Vacuum Cost Divide Patch
> > > > Total Delay        1784 (ms)                           1398(ms)
> > > >                  1938(ms)
> > > >
> > > >
> > > > Test2 (Variable size dead tuple in index)
> > > > create table test(a int, b varchar, c varchar);
> > > > create index idx1 on test(a);
> > > > create index idx2 on test(b) where a > 100000;
> > > > create index idx3 on test(c) where a > 150000;
> > > >
> > > > insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> > > > generate_series(1,500000) as i;
> > > > delete from test where a < 200000;
> > > >
> > > > Vacuum (Head)                                   Parallel Vacuum
> > > >               Vacuum Cost Divide Patch
> > > > Total Delay 1438 (ms)                               1029(ms)
> > > >                    1529(ms)
> > > >
> > > >
> > > > Conclusion:
> > > > 1. The tests prove that the total I/O delay is significantly less with
> > > > the parallel vacuum.
> > > > 2. With the vacuum cost divide the problem is solved but the delay bit
> > > > more compared to the non-parallel version.  The reason could be the
> > > > problem discussed at[2], but it needs further investigation.
> > > >
> > > > Next, I will test with the v31-0006 (shared vacuum cost) patch.  I
> > > > will also try to test different types of indexes.
> > > >
> > >
> > > Thank you for testing!
> > >
> > > I realized that v31-0006 patch doesn't work fine so I've attached the
> > > updated version patch that also incorporated some comments I got so
> > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > > test the total delay time.
> > >
> >
> > FWIW I'd like to share the results of total delay time evaluation of
> > approach (a) (shared cost balance). I used the same workloads that
> > Dilip shared and set vacuum_cost_delay to 10. The results of two test
> > cases are here:
> >
> > * Test1
> > normal      : 12656 ms (hit 50594, miss 5700, dirty 7258, total 63552)
> > 2 workers : 17149 ms (hit 47673, miss 8647, dirty 9157, total 65477)
> > 1 worker   : 19498 ms (hit 45954, miss 10340, dirty 10517, total 66811)
> >
> > * Test2
> > normal      : 1530 ms (hit 30645, miss 2, dirty 3, total 30650)
> > 2 workers : 1538 ms (hit 30645, miss 2, dirty 3, total 30650)
> > 1 worker   : 1538 ms (hit 30645, miss 2, dirty 3, total 30650)
> >
> > 'hit', 'miss' and 'dirty' are the total numbers of buffer hits, buffer
> > misses and flushing dirty buffer, respectively. 'total' is the sum of
> > these three values.
> >
> > In this evaluation I expect that parallel vacuum cases delay time as
> > much as the time of normal vacuum because the total number of pages to
> > vacuum is the same and we have the shared cost balance value and each
> > workers decide to sleep based on that value. According to the above
> > Test1 results, we can see that there is a big difference in the total
> > delay time among  these cases (normal vacuum case is shortest), but
> > the cause of this is that parallel vacuum had to to flush more dirty
> > pages. Actually after increased shared_buffer I got expected results:
> >
> > * Test1 (after increased shared_buffers)
> > normal      : 2807 ms (hit 56295, miss 2, dirty 3, total 56300)
> > 2 workers : 2840 ms (hit 56295, miss 2, dirty 3, total 56300)
> > 1 worker   : 2841 ms (hit 56295, miss 2, dirty 3, total 56300)
> >
> > I updated the patch that computes the total cost delay shared by
> > Dilip[1] so that it collects the number of buffer hits and so on, and
> > have attached it. It can be applied on top of my latest patch set[1].
>
> Thanks, Sawada-san.  In my next test, I will use this updated patch.
>
Few comments on the latest patch.

+heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
+{
...
+
+ stats = (IndexBulkDeleteResult **)
+ palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
+
+ if (lvshared->maintenance_work_mem_worker > 0)
+ maintenance_work_mem = lvshared->maintenance_work_mem_worker;

So for a worker, we have set the new value of the
maintenance_work_mem,  But if the leader is participating in the index
vacuuming then
shouldn't we set the new value of the maintenance_work_mem for the
leader as well?


+static void
+prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
+{
+ char *p = (char *) GetSharedIndStats(lvshared);
+ int vac_work_mem = IsAutoVacuumWorkerProcess() &&
+ autovacuum_work_mem != -1 ?
+ autovacuum_work_mem : maintenance_work_mem;
+ int nindexes_mwm = 0;
+ int i;

Can this ever be called from the Autovacuum Worker?  I think instead
of adding handling for the auto vacuum worker we
can have an assert.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Mon, Oct 28, 2019 at 3:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sun, Oct 27, 2019 at 12:52 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Oct 25, 2019 at 9:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > >
> > I haven't yet read the new set of the patch.  But, I have noticed one
> > thing.  That we are getting the size of the statistics using the AM
> > routine.  But, we are copying those statistics from local memory to
> > the shared memory directly using the memcpy.   Wouldn't it be a good
> > idea to have an AM specific routine to get it copied from the local
> > memory to the shared memory?  I am not sure it is worth it or not but
> > my thought behind this point is that it will give AM to have local
> > stats in any form ( like they can store a pointer in that ) but they
> > can serialize that while copying to shared stats.  And, later when
> > shared stats are passed back to the Am then it can deserialize in its
> > local form and use it.
> >
>
> You have a point, but after changing the gist index, we don't have any
> current usage for indexes that need something like that. So, on one
> side there is some value in having an API to copy the stats, but on
> the other side without having clear usage of an API, it might not be
> good to expose a new API for the same.   I think we can expose such an
> API in the future if there is a need for the same.  Do you or anyone
> know of any external IndexAM that has such a need?
>
> Few minor comments while glancing through the latest patchset.
>
> 1. I think you can merge 0001*, 0002*, 0003* patch into one patch as
> all three expose new variable/function from IndexAmRoutine.

Fixed.

>
> 2.
> +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
> +{
> + char *p = (char *) GetSharedIndStats(lvshared);
> + int vac_work_mem = IsAutoVacuumWorkerProcess() &&
> + autovacuum_work_mem != -1 ?
> + autovacuum_work_mem : maintenance_work_mem;
>
> I think this function won't be called from AutoVacuumWorkerProcess at
> least not as of now, so isn't it a better idea to have an Assert for
> it?

Fixed.

>
> 3.
> +void
> +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
>
> This function is for performing a parallel operation on the index, so
> why to start with heap?

Because parallel vacuum supports only indexes that are created on heaps.

>  It is better to name it as
> index_parallel_vacuum_main or simply parallel_vacuum_main.

I'm concerned that both names index_parallel_vacuum_main and
parallel_vacuum_main seem to be generic in spite of these codes are
heap-specific code.

>
> 4.
> /* useindex = true means two-pass strategy; false means one-pass */
> @@ -128,17 +280,12 @@ typedef struct LVRelStats
>   BlockNumber pages_removed;
>   double tuples_deleted;
>   BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
> - /* List of TIDs of tuples we intend to delete */
> - /* NB: this list is ordered by TID address */
> - int num_dead_tuples; /* current # of entries */
> - int max_dead_tuples; /* # slots allocated in array */
> - ItemPointer dead_tuples; /* array of ItemPointerData */
> + LVDeadTuples *dead_tuples;
>   int num_index_scans;
>   TransactionId latestRemovedXid;
>   bool lock_waiter_detected;
>  } LVRelStats;
>
> -
>  /* A few variables that don't seem worth passing around as parameters */
>  static int elevel = -1;
>
> It seems like a spurious line removal.

Fixed.

These above comments are incorporated in the latest patch set(v32) [1].

[1] https://www.postgresql.org/message-id/CAD21AoAqT17QwKJ_sWOqRxNvg66wMw1oZZzf9Rt-E-zD%2BXOh_Q%40mail.gmail.com

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Tue, Oct 29, 2019 at 1:59 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Actually after increased shared_buffer I got expected results:
>
> * Test1 (after increased shared_buffers)
> normal      : 2807 ms (hit 56295, miss 2, dirty 3, total 56300)
> 2 workers : 2840 ms (hit 56295, miss 2, dirty 3, total 56300)
> 1 worker   : 2841 ms (hit 56295, miss 2, dirty 3, total 56300)
>
> I updated the patch that computes the total cost delay shared by
> Dilip[1] so that it collects the number of buffer hits and so on, and
> have attached it. It can be applied on top of my latest patch set[1].

I tried to repeat the test to see the IO delay with
v32-0004-PoC-shared-vacuum-cost-balance.patch [1].  I tried with
shared memory 4GB.  I recreated the database and restarted the server
before each run.  But, I could not see the same I/O delay and cost is
also not the same.  Can you please tell me how much shared buffers did
you set?

Test1 (4GB shared buffers)
normal:      stats delay 1348.160000, hit 68952, miss 2, dirty 10063,
total 79017
1 worker:   stats delay 1821.255000, hit 78184, miss 2, dirty 14095, total 92281
2 workers: stats delay 2224.415000, hit 86482, miss 2, dirty 17665, total 104149

[1] https://www.postgresql.org/message-id/CAD21AoAqT17QwKJ_sWOqRxNvg66wMw1oZZzf9Rt-E-zD%2BXOh_Q%40mail.gmail.com

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Thu, Oct 31, 2019 at 11:33 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Oct 29, 2019 at 1:59 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > Actually after increased shared_buffer I got expected results:
> >
> > * Test1 (after increased shared_buffers)
> > normal      : 2807 ms (hit 56295, miss 2, dirty 3, total 56300)
> > 2 workers : 2840 ms (hit 56295, miss 2, dirty 3, total 56300)
> > 1 worker   : 2841 ms (hit 56295, miss 2, dirty 3, total 56300)
> >
> > I updated the patch that computes the total cost delay shared by
> > Dilip[1] so that it collects the number of buffer hits and so on, and
> > have attached it. It can be applied on top of my latest patch set[1].

While reading your modified patch (PoC-delay-stats.patch), I have
noticed that in my patch I used below formulae to compute the total
delay
total delay = delay in heap scan + (total delay of index scan
/nworkers). But, in your patch, I can see that it is just total sum of
all delay.  IMHO, the total sleep time during the index vacuum phase
must be divided by the number of workers, because even if at some
point, all the workers go for sleep (e.g. 10 msec) then the delay in
I/O will be only for 10msec not 30 msec.  I think the same is
discussed upthread[1]

[1] https://www.postgresql.org/message-id/CAA4eK1%2BPeiFLdTuwrE6CvbNdx80E-O%3DZxCuWB2maREKFD-RaCA%40mail.gmail.com

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Thu, Oct 31, 2019 at 3:45 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Oct 31, 2019 at 11:33 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Tue, Oct 29, 2019 at 1:59 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > Actually after increased shared_buffer I got expected results:
> > >
> > > * Test1 (after increased shared_buffers)
> > > normal      : 2807 ms (hit 56295, miss 2, dirty 3, total 56300)
> > > 2 workers : 2840 ms (hit 56295, miss 2, dirty 3, total 56300)
> > > 1 worker   : 2841 ms (hit 56295, miss 2, dirty 3, total 56300)
> > >
> > > I updated the patch that computes the total cost delay shared by
> > > Dilip[1] so that it collects the number of buffer hits and so on, and
> > > have attached it. It can be applied on top of my latest patch set[1].
>
> While reading your modified patch (PoC-delay-stats.patch), I have
> noticed that in my patch I used below formulae to compute the total
> delay
> total delay = delay in heap scan + (total delay of index scan
> /nworkers). But, in your patch, I can see that it is just total sum of
> all delay.  IMHO, the total sleep time during the index vacuum phase
> must be divided by the number of workers, because even if at some
> point, all the workers go for sleep (e.g. 10 msec) then the delay in
> I/O will be only for 10msec not 30 msec.  I think the same is
> discussed upthread[1]
>

I think that two approaches make parallel vacuum worker wait in
different way: in approach(a) the vacuum delay works as if vacuum is
performed by single process, on the other hand in approach(b) the
vacuum delay work for each workers independently.

Suppose that the total number of blocks to vacuum is 10,000 blocks,
the cost per blocks is 10, the cost limit is 200 and sleep time is 5
ms. In single process vacuum the total sleep time is 2,500ms (=
(10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms.
Because all parallel vacuum workers use the shared balance value and a
worker sleeps once the balance value exceeds the limit. In
approach(b), since the cost limit is divided evenly the value of each
workers is 40 (e.g. when 5 parallel degree). And suppose each workers
processes blocks  evenly,  the total sleep time of all workers is
12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can
compute the sleep time of approach(b) by dividing the total value by
the number of parallel workers.

IOW the approach(b) makes parallel vacuum delay much more than normal
vacuum and parallel vacuum with approach(a) even with the same
settings. Which behaviors do we expect? I thought the vacuum delay for
parallel vacuum should work as if it's a single process vacuum as we
did for memory usage. I might be missing something. If we prefer
approach(b) I should change the patch so that the leader process
divides the cost limit evenly.

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Oct 31, 2019 at 3:45 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Oct 31, 2019 at 11:33 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Tue, Oct 29, 2019 at 1:59 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > Actually after increased shared_buffer I got expected results:
> > > >
> > > > * Test1 (after increased shared_buffers)
> > > > normal      : 2807 ms (hit 56295, miss 2, dirty 3, total 56300)
> > > > 2 workers : 2840 ms (hit 56295, miss 2, dirty 3, total 56300)
> > > > 1 worker   : 2841 ms (hit 56295, miss 2, dirty 3, total 56300)
> > > >
> > > > I updated the patch that computes the total cost delay shared by
> > > > Dilip[1] so that it collects the number of buffer hits and so on, and
> > > > have attached it. It can be applied on top of my latest patch set[1].
> >
> > While reading your modified patch (PoC-delay-stats.patch), I have
> > noticed that in my patch I used below formulae to compute the total
> > delay
> > total delay = delay in heap scan + (total delay of index scan
> > /nworkers). But, in your patch, I can see that it is just total sum of
> > all delay.  IMHO, the total sleep time during the index vacuum phase
> > must be divided by the number of workers, because even if at some
> > point, all the workers go for sleep (e.g. 10 msec) then the delay in
> > I/O will be only for 10msec not 30 msec.  I think the same is
> > discussed upthread[1]
> >
>
> I think that two approaches make parallel vacuum worker wait in
> different way: in approach(a) the vacuum delay works as if vacuum is
> performed by single process, on the other hand in approach(b) the
> vacuum delay work for each workers independently.
>
> Suppose that the total number of blocks to vacuum is 10,000 blocks,
> the cost per blocks is 10, the cost limit is 200 and sleep time is 5
> ms. In single process vacuum the total sleep time is 2,500ms (=
> (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms.
> Because all parallel vacuum workers use the shared balance value and a
> worker sleeps once the balance value exceeds the limit. In
> approach(b), since the cost limit is divided evenly the value of each
> workers is 40 (e.g. when 5 parallel degree). And suppose each workers
> processes blocks  evenly,  the total sleep time of all workers is
> 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can
> compute the sleep time of approach(b) by dividing the total value by
> the number of parallel workers.
>
> IOW the approach(b) makes parallel vacuum delay much more than normal
> vacuum and parallel vacuum with approach(a) even with the same
> settings. Which behaviors do we expect? I thought the vacuum delay for
> parallel vacuum should work as if it's a single process vacuum as we
> did for memory usage. I might be missing something. If we prefer
> approach(b) I should change the patch so that the leader process
> divides the cost limit evenly.
>
I have repeated the same test (test1 and test2)[1] with a higher
shared buffer (1GB).  Currently, I have used the same formula for
computing the total delay
heap scan delay + index vacuuming delay / workers.  Because, In my
opinion, multiple workers are doing I/O here so the total delay should
also be in multiple
of the number of workers.  So if we want to compare the delay with the
sequential vacuum then we should divide total delay by the number of
workers.  But, I am not
sure whether computing the total delay is the right way to compute the
I/O throttling or not.  But, I support the approach (b) for dividing
the I/O limit because
auto vacuum workers are already operating with this approach.

test1:
normal: stats delay 1348.160000, hit 68952, miss 2, dirty 10063, total 79017
1 worker: stats delay 1349.585000, hit 68954, miss 2, dirty 10146,
total 79102 (cost divide patch)
2 worker: stats delay 1341.416141, hit 68956, miss 2, dirty 10036,
total 78994 (cost divide patch)
1 worker: stats delay 1025.495000, hit 78184, miss 2, dirty 14066,
total 92252 (share cost patch)
2 worker: stats delay 904.366667, hit 86482, miss 2, dirty 17806,
total 104290 (share cost patch)

test2:
normal: stats delay 530.475000, hit 36982, miss 2, dirty 3488, total 40472
1 worker: stats delay 530.700000, hit 36984, miss 2, dirty 3527, total
40513 (cost divide patch)
2 worker: stats delay 530.675000, hit 36984, miss 2, dirty 3532, total
40518 (cost divide patch)
1 worker: stats delay 490.570000, hit 39090, miss 2, dirty 3497, total
42589 (share cost patch)
2 worker: stats delay 480.571667, hit 39050, miss 2, dirty 3819, total
42871 (share cost patch)

So with higher, shared buffers,  I can see with approach (b) we can
see the same total delay.  With approach (a) I can see a bit less
total delay.  But, a point to be noted that I have used the same
formulae for computing the total delay for both the approaches.  But,
Sawada-san explained in the above mail that it may not be the right
way to computing the total delay for the approach (a).  But my take is
that whether we are working with shared cost or we are dividing the
cost, the delay must be divided by number of workers in the parallel
phase. @Amit Kapila, what is your opinion on this?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Mon, Oct 28, 2019 at 1:52 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Oct 28, 2019 at 12:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sun, Oct 27, 2019 at 12:52 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Fri, Oct 25, 2019 at 9:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > >
> > > I haven't yet read the new set of the patch.  But, I have noticed one
> > > thing.  That we are getting the size of the statistics using the AM
> > > routine.  But, we are copying those statistics from local memory to
> > > the shared memory directly using the memcpy.   Wouldn't it be a good
> > > idea to have an AM specific routine to get it copied from the local
> > > memory to the shared memory?  I am not sure it is worth it or not but
> > > my thought behind this point is that it will give AM to have local
> > > stats in any form ( like they can store a pointer in that ) but they
> > > can serialize that while copying to shared stats.  And, later when
> > > shared stats are passed back to the Am then it can deserialize in its
> > > local form and use it.
> > >
> >
> > You have a point, but after changing the gist index, we don't have any
> > current usage for indexes that need something like that. So, on one
> > side there is some value in having an API to copy the stats, but on
> > the other side without having clear usage of an API, it might not be
> > good to expose a new API for the same.   I think we can expose such an
> > API in the future if there is a need for the same.
> I agree with the point.  But, the current patch exposes an API for
> estimating the size for the statistics.  So IMHO, either we expose
> both APIs for estimating the size of the stats and copy the stats or
> none.  Am I missing something here?
>

I think the first one is a must as the things stand today because
otherwise, we won't be able to copy the stats.  The second one (expose
an API to copy stats) is good to have but there is no usage of it
immediately.  We can expose the second API considering the future need
but as there is no valid case as of now, it will be difficult to test
and we are also not sure whether in future any IndexAM will require
such an API.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> I think that two approaches make parallel vacuum worker wait in
> different way: in approach(a) the vacuum delay works as if vacuum is
> performed by single process, on the other hand in approach(b) the
> vacuum delay work for each workers independently.
>
> Suppose that the total number of blocks to vacuum is 10,000 blocks,
> the cost per blocks is 10, the cost limit is 200 and sleep time is 5
> ms. In single process vacuum the total sleep time is 2,500ms (=
> (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms.
> Because all parallel vacuum workers use the shared balance value and a
> worker sleeps once the balance value exceeds the limit. In
> approach(b), since the cost limit is divided evenly the value of each
> workers is 40 (e.g. when 5 parallel degree). And suppose each workers
> processes blocks  evenly,  the total sleep time of all workers is
> 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can
> compute the sleep time of approach(b) by dividing the total value by
> the number of parallel workers.
>
> IOW the approach(b) makes parallel vacuum delay much more than normal
> vacuum and parallel vacuum with approach(a) even with the same
> settings. Which behaviors do we expect?
>

Yeah, this is an important thing to decide.  I don't think that the
conclusion you are drawing is correct because it that is true then the
same applies to the current autovacuum work division where we divide
the cost_limit among workers but the cost_delay is same (see
autovac_balance_cost).  Basically, if we consider the delay time of
each worker independently, then it would appear that a parallel vacuum
delay with approach (b) is more, but that is true only if the workers
run serially which is not true.

> I thought the vacuum delay for
> parallel vacuum should work as if it's a single process vacuum as we
> did for memory usage. I might be missing something. If we prefer
> approach(b) I should change the patch so that the leader process
> divides the cost limit evenly.
>

I am also not completely sure which approach is better but I slightly
lean towards approach (b).  I think we need input from some other
people as well.  I will start a separate thread to discuss this and
see if that helps to get the input from others.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Sun, Nov 3, 2019 at 9:49 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> >
> > I think that two approaches make parallel vacuum worker wait in
> > different way: in approach(a) the vacuum delay works as if vacuum is
> > performed by single process, on the other hand in approach(b) the
> > vacuum delay work for each workers independently.
> >
> > Suppose that the total number of blocks to vacuum is 10,000 blocks,
> > the cost per blocks is 10, the cost limit is 200 and sleep time is 5
> > ms. In single process vacuum the total sleep time is 2,500ms (=
> > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms.
> > Because all parallel vacuum workers use the shared balance value and a
> > worker sleeps once the balance value exceeds the limit. In
> > approach(b), since the cost limit is divided evenly the value of each
> > workers is 40 (e.g. when 5 parallel degree). And suppose each workers
> > processes blocks  evenly,  the total sleep time of all workers is
> > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can
> > compute the sleep time of approach(b) by dividing the total value by
> > the number of parallel workers.
> >
> > IOW the approach(b) makes parallel vacuum delay much more than normal
> > vacuum and parallel vacuum with approach(a) even with the same
> > settings. Which behaviors do we expect? I thought the vacuum delay for
> > parallel vacuum should work as if it's a single process vacuum as we
> > did for memory usage. I might be missing something. If we prefer
> > approach(b) I should change the patch so that the leader process
> > divides the cost limit evenly.
> >
> I have repeated the same test (test1 and test2)[1] with a higher
> shared buffer (1GB).  Currently, I have used the same formula for
> computing the total delay
> heap scan delay + index vacuuming delay / workers.  Because, In my
> opinion, multiple workers are doing I/O here so the total delay should
> also be in multiple
> of the number of workers.  So if we want to compare the delay with the
> sequential vacuum then we should divide total delay by the number of
> workers.  But, I am not
> sure whether computing the total delay is the right way to compute the
> I/O throttling or not.  But, I support the approach (b) for dividing
> the I/O limit because
> auto vacuum workers are already operating with this approach.
>
> test1:
> normal: stats delay 1348.160000, hit 68952, miss 2, dirty 10063, total 79017
> 1 worker: stats delay 1349.585000, hit 68954, miss 2, dirty 10146,
> total 79102 (cost divide patch)
> 2 worker: stats delay 1341.416141, hit 68956, miss 2, dirty 10036,
> total 78994 (cost divide patch)
> 1 worker: stats delay 1025.495000, hit 78184, miss 2, dirty 14066,
> total 92252 (share cost patch)
> 2 worker: stats delay 904.366667, hit 86482, miss 2, dirty 17806,
> total 104290 (share cost patch)
>
> test2:
> normal: stats delay 530.475000, hit 36982, miss 2, dirty 3488, total 40472
> 1 worker: stats delay 530.700000, hit 36984, miss 2, dirty 3527, total
> 40513 (cost divide patch)
> 2 worker: stats delay 530.675000, hit 36984, miss 2, dirty 3532, total
> 40518 (cost divide patch)
> 1 worker: stats delay 490.570000, hit 39090, miss 2, dirty 3497, total
> 42589 (share cost patch)
> 2 worker: stats delay 480.571667, hit 39050, miss 2, dirty 3819, total
> 42871 (share cost patch)
>
> So with higher, shared buffers,  I can see with approach (b) we can
> see the same total delay.  With approach (a) I can see a bit less
> total delay.  But, a point to be noted that I have used the same
> formulae for computing the total delay for both the approaches.  But,
> Sawada-san explained in the above mail that it may not be the right
> way to computing the total delay for the approach (a).  But my take is
> that whether we are working with shared cost or we are dividing the
> cost, the delay must be divided by number of workers in the parallel
> phase.
>

Why do you think so?  I think with approach (b) if all the workers are
doing equal amount of I/O, they will probably sleep at the same time
whereas with approach (a) each of them will sleep at different times.
So, probably dividing the delay in approach (b) makes more sense.


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Mon, Nov 4, 2019 at 10:45 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sun, Nov 3, 2019 at 9:49 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > >
> > > I think that two approaches make parallel vacuum worker wait in
> > > different way: in approach(a) the vacuum delay works as if vacuum is
> > > performed by single process, on the other hand in approach(b) the
> > > vacuum delay work for each workers independently.
> > >
> > > Suppose that the total number of blocks to vacuum is 10,000 blocks,
> > > the cost per blocks is 10, the cost limit is 200 and sleep time is 5
> > > ms. In single process vacuum the total sleep time is 2,500ms (=
> > > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms.
> > > Because all parallel vacuum workers use the shared balance value and a
> > > worker sleeps once the balance value exceeds the limit. In
> > > approach(b), since the cost limit is divided evenly the value of each
> > > workers is 40 (e.g. when 5 parallel degree). And suppose each workers
> > > processes blocks  evenly,  the total sleep time of all workers is
> > > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can
> > > compute the sleep time of approach(b) by dividing the total value by
> > > the number of parallel workers.
> > >
> > > IOW the approach(b) makes parallel vacuum delay much more than normal
> > > vacuum and parallel vacuum with approach(a) even with the same
> > > settings. Which behaviors do we expect? I thought the vacuum delay for
> > > parallel vacuum should work as if it's a single process vacuum as we
> > > did for memory usage. I might be missing something. If we prefer
> > > approach(b) I should change the patch so that the leader process
> > > divides the cost limit evenly.
> > >
> > I have repeated the same test (test1 and test2)[1] with a higher
> > shared buffer (1GB).  Currently, I have used the same formula for
> > computing the total delay
> > heap scan delay + index vacuuming delay / workers.  Because, In my
> > opinion, multiple workers are doing I/O here so the total delay should
> > also be in multiple
> > of the number of workers.  So if we want to compare the delay with the
> > sequential vacuum then we should divide total delay by the number of
> > workers.  But, I am not
> > sure whether computing the total delay is the right way to compute the
> > I/O throttling or not.  But, I support the approach (b) for dividing
> > the I/O limit because
> > auto vacuum workers are already operating with this approach.
> >
> > test1:
> > normal: stats delay 1348.160000, hit 68952, miss 2, dirty 10063, total 79017
> > 1 worker: stats delay 1349.585000, hit 68954, miss 2, dirty 10146,
> > total 79102 (cost divide patch)
> > 2 worker: stats delay 1341.416141, hit 68956, miss 2, dirty 10036,
> > total 78994 (cost divide patch)
> > 1 worker: stats delay 1025.495000, hit 78184, miss 2, dirty 14066,
> > total 92252 (share cost patch)
> > 2 worker: stats delay 904.366667, hit 86482, miss 2, dirty 17806,
> > total 104290 (share cost patch)
> >
> > test2:
> > normal: stats delay 530.475000, hit 36982, miss 2, dirty 3488, total 40472
> > 1 worker: stats delay 530.700000, hit 36984, miss 2, dirty 3527, total
> > 40513 (cost divide patch)
> > 2 worker: stats delay 530.675000, hit 36984, miss 2, dirty 3532, total
> > 40518 (cost divide patch)
> > 1 worker: stats delay 490.570000, hit 39090, miss 2, dirty 3497, total
> > 42589 (share cost patch)
> > 2 worker: stats delay 480.571667, hit 39050, miss 2, dirty 3819, total
> > 42871 (share cost patch)
> >
> > So with higher, shared buffers,  I can see with approach (b) we can
> > see the same total delay.  With approach (a) I can see a bit less
> > total delay.  But, a point to be noted that I have used the same
> > formulae for computing the total delay for both the approaches.  But,
> > Sawada-san explained in the above mail that it may not be the right
> > way to computing the total delay for the approach (a).  But my take is
> > that whether we are working with shared cost or we are dividing the
> > cost, the delay must be divided by number of workers in the parallel
> > phase.
> >
>
> Why do you think so?  I think with approach (b) if all the workers are
> doing equal amount of I/O, they will probably sleep at the same time
> whereas with approach (a) each of them will sleep at different times.
> So, probably dividing the delay in approach (b) makes more sense.

Just to be clear,  I did not mean that we divide the sleep time for
each worker.  Actually, I meant how to project the total delay in the
test patch.  So I think if we directly want to compare the sleep time
of the sequential vs parallel then it's not fair to just compare the
total sleep time because when multiple workers are working parallelly
shouldn't we need to consider their average sleep time?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Mon, Nov 4, 2019 at 10:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > I think that two approaches make parallel vacuum worker wait in
> > different way: in approach(a) the vacuum delay works as if vacuum is
> > performed by single process, on the other hand in approach(b) the
> > vacuum delay work for each workers independently.
> >
> > Suppose that the total number of blocks to vacuum is 10,000 blocks,
> > the cost per blocks is 10, the cost limit is 200 and sleep time is 5
> > ms. In single process vacuum the total sleep time is 2,500ms (=
> > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms.
> > Because all parallel vacuum workers use the shared balance value and a
> > worker sleeps once the balance value exceeds the limit. In
> > approach(b), since the cost limit is divided evenly the value of each
> > workers is 40 (e.g. when 5 parallel degree). And suppose each workers
> > processes blocks  evenly,  the total sleep time of all workers is
> > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can
> > compute the sleep time of approach(b) by dividing the total value by
> > the number of parallel workers.
> >
> > IOW the approach(b) makes parallel vacuum delay much more than normal
> > vacuum and parallel vacuum with approach(a) even with the same
> > settings. Which behaviors do we expect?
> >
>
> Yeah, this is an important thing to decide.  I don't think that the
> conclusion you are drawing is correct because it that is true then the
> same applies to the current autovacuum work division where we divide
> the cost_limit among workers but the cost_delay is same (see
> autovac_balance_cost).  Basically, if we consider the delay time of
> each worker independently, then it would appear that a parallel vacuum
> delay with approach (b) is more, but that is true only if the workers
> run serially which is not true.
>
> > I thought the vacuum delay for
> > parallel vacuum should work as if it's a single process vacuum as we
> > did for memory usage. I might be missing something. If we prefer
> > approach(b) I should change the patch so that the leader process
> > divides the cost limit evenly.
> >
>
> I am also not completely sure which approach is better but I slightly
> lean towards approach (b).  I think we need input from some other
> people as well.  I will start a separate thread to discuss this and
> see if that helps to get the input from others.

+1


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Mon, 4 Nov 2019 at 14:02, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > I think that two approaches make parallel vacuum worker wait in
> > different way: in approach(a) the vacuum delay works as if vacuum is
> > performed by single process, on the other hand in approach(b) the
> > vacuum delay work for each workers independently.
> >
> > Suppose that the total number of blocks to vacuum is 10,000 blocks,
> > the cost per blocks is 10, the cost limit is 200 and sleep time is 5
> > ms. In single process vacuum the total sleep time is 2,500ms (=
> > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms.
> > Because all parallel vacuum workers use the shared balance value and a
> > worker sleeps once the balance value exceeds the limit. In
> > approach(b), since the cost limit is divided evenly the value of each
> > workers is 40 (e.g. when 5 parallel degree). And suppose each workers
> > processes blocks  evenly,  the total sleep time of all workers is
> > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can
> > compute the sleep time of approach(b) by dividing the total value by
> > the number of parallel workers.
> >
> > IOW the approach(b) makes parallel vacuum delay much more than normal
> > vacuum and parallel vacuum with approach(a) even with the same
> > settings. Which behaviors do we expect?
> >
>
> Yeah, this is an important thing to decide.  I don't think that the
> conclusion you are drawing is correct because it that is true then the
> same applies to the current autovacuum work division where we divide
> the cost_limit among workers but the cost_delay is same (see
> autovac_balance_cost).  Basically, if we consider the delay time of
> each worker independently, then it would appear that a parallel vacuum
> delay with approach (b) is more, but that is true only if the workers
> run serially which is not true.
>
> > I thought the vacuum delay for
> > parallel vacuum should work as if it's a single process vacuum as we
> > did for memory usage. I might be missing something. If we prefer
> > approach(b) I should change the patch so that the leader process
> > divides the cost limit evenly.
> >
>
> I am also not completely sure which approach is better but I slightly
> lean towards approach (b).

Can we get the same sleep time as approach (b) if we divide the cost
limit by the number of workers and have the shared cost balance (i.e.
approach (a) with dividing the cost limit)? Currently the approach (b)
seems better but I'm concerned that it might unnecessarily delay
vacuum if some indexes are very small or bulk-deletions of indexes
does almost nothing such as brin.

>
>   I think we need input from some other
> people as well.  I will start a separate thread to discuss this and
> see if that helps to get the input from others.

+1

--
Masahiko Sawada  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Mon, Nov 4, 2019 at 1:00 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Mon, 4 Nov 2019 at 14:02, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > I think that two approaches make parallel vacuum worker wait in
> > > different way: in approach(a) the vacuum delay works as if vacuum is
> > > performed by single process, on the other hand in approach(b) the
> > > vacuum delay work for each workers independently.
> > >
> > > Suppose that the total number of blocks to vacuum is 10,000 blocks,
> > > the cost per blocks is 10, the cost limit is 200 and sleep time is 5
> > > ms. In single process vacuum the total sleep time is 2,500ms (=
> > > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms.
> > > Because all parallel vacuum workers use the shared balance value and a
> > > worker sleeps once the balance value exceeds the limit. In
> > > approach(b), since the cost limit is divided evenly the value of each
> > > workers is 40 (e.g. when 5 parallel degree). And suppose each workers
> > > processes blocks  evenly,  the total sleep time of all workers is
> > > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can
> > > compute the sleep time of approach(b) by dividing the total value by
> > > the number of parallel workers.
> > >
> > > IOW the approach(b) makes parallel vacuum delay much more than normal
> > > vacuum and parallel vacuum with approach(a) even with the same
> > > settings. Which behaviors do we expect?
> > >
> >
> > Yeah, this is an important thing to decide.  I don't think that the
> > conclusion you are drawing is correct because it that is true then the
> > same applies to the current autovacuum work division where we divide
> > the cost_limit among workers but the cost_delay is same (see
> > autovac_balance_cost).  Basically, if we consider the delay time of
> > each worker independently, then it would appear that a parallel vacuum
> > delay with approach (b) is more, but that is true only if the workers
> > run serially which is not true.
> >
> > > I thought the vacuum delay for
> > > parallel vacuum should work as if it's a single process vacuum as we
> > > did for memory usage. I might be missing something. If we prefer
> > > approach(b) I should change the patch so that the leader process
> > > divides the cost limit evenly.
> > >
> >
> > I am also not completely sure which approach is better but I slightly
> > lean towards approach (b).
>
> Can we get the same sleep time as approach (b) if we divide the cost
> limit by the number of workers and have the shared cost balance (i.e.
> approach (a) with dividing the cost limit)? Currently the approach (b)
> seems better but I'm concerned that it might unnecessarily delay
> vacuum if some indexes are very small or bulk-deletions of indexes
> does almost nothing such as brin.

Are you worried that some of the workers might not have much I/O to do
but still we divide the cost limit equally? If that is the case then
that is the case with the auto vacuum workers also right?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Mon, 4 Nov 2019 at 17:26, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Nov 4, 2019 at 1:00 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Mon, 4 Nov 2019 at 14:02, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > I think that two approaches make parallel vacuum worker wait in
> > > > different way: in approach(a) the vacuum delay works as if vacuum is
> > > > performed by single process, on the other hand in approach(b) the
> > > > vacuum delay work for each workers independently.
> > > >
> > > > Suppose that the total number of blocks to vacuum is 10,000 blocks,
> > > > the cost per blocks is 10, the cost limit is 200 and sleep time is 5
> > > > ms. In single process vacuum the total sleep time is 2,500ms (=
> > > > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms.
> > > > Because all parallel vacuum workers use the shared balance value and a
> > > > worker sleeps once the balance value exceeds the limit. In
> > > > approach(b), since the cost limit is divided evenly the value of each
> > > > workers is 40 (e.g. when 5 parallel degree). And suppose each workers
> > > > processes blocks  evenly,  the total sleep time of all workers is
> > > > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can
> > > > compute the sleep time of approach(b) by dividing the total value by
> > > > the number of parallel workers.
> > > >
> > > > IOW the approach(b) makes parallel vacuum delay much more than normal
> > > > vacuum and parallel vacuum with approach(a) even with the same
> > > > settings. Which behaviors do we expect?
> > > >
> > >
> > > Yeah, this is an important thing to decide.  I don't think that the
> > > conclusion you are drawing is correct because it that is true then the
> > > same applies to the current autovacuum work division where we divide
> > > the cost_limit among workers but the cost_delay is same (see
> > > autovac_balance_cost).  Basically, if we consider the delay time of
> > > each worker independently, then it would appear that a parallel vacuum
> > > delay with approach (b) is more, but that is true only if the workers
> > > run serially which is not true.
> > >
> > > > I thought the vacuum delay for
> > > > parallel vacuum should work as if it's a single process vacuum as we
> > > > did for memory usage. I might be missing something. If we prefer
> > > > approach(b) I should change the patch so that the leader process
> > > > divides the cost limit evenly.
> > > >
> > >
> > > I am also not completely sure which approach is better but I slightly
> > > lean towards approach (b).
> >
> > Can we get the same sleep time as approach (b) if we divide the cost
> > limit by the number of workers and have the shared cost balance (i.e.
> > approach (a) with dividing the cost limit)? Currently the approach (b)
> > seems better but I'm concerned that it might unnecessarily delay
> > vacuum if some indexes are very small or bulk-deletions of indexes
> > does almost nothing such as brin.
>
> Are you worried that some of the workers might not have much I/O to do
> but still we divide the cost limit equally?

Yes.

> If that is the case then
> that is the case with the auto vacuum workers also right?

I think It is not right because we rebalance the cost after an
autovacuum worker finished. So as Amit mentioned on the new thread we
might need to make parallel vacuum workers notice to the leader once
exited so that it can rebalance the cost.

Regards,

--
Masahiko Sawada      http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Mon, Nov 4, 2019 at 2:11 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Mon, 4 Nov 2019 at 17:26, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Mon, Nov 4, 2019 at 1:00 PM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Mon, 4 Nov 2019 at 14:02, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > >
> > > > > I think that two approaches make parallel vacuum worker wait in
> > > > > different way: in approach(a) the vacuum delay works as if vacuum is
> > > > > performed by single process, on the other hand in approach(b) the
> > > > > vacuum delay work for each workers independently.
> > > > >
> > > > > Suppose that the total number of blocks to vacuum is 10,000 blocks,
> > > > > the cost per blocks is 10, the cost limit is 200 and sleep time is 5
> > > > > ms. In single process vacuum the total sleep time is 2,500ms (=
> > > > > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms.
> > > > > Because all parallel vacuum workers use the shared balance value and a
> > > > > worker sleeps once the balance value exceeds the limit. In
> > > > > approach(b), since the cost limit is divided evenly the value of each
> > > > > workers is 40 (e.g. when 5 parallel degree). And suppose each workers
> > > > > processes blocks  evenly,  the total sleep time of all workers is
> > > > > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can
> > > > > compute the sleep time of approach(b) by dividing the total value by
> > > > > the number of parallel workers.
> > > > >
> > > > > IOW the approach(b) makes parallel vacuum delay much more than normal
> > > > > vacuum and parallel vacuum with approach(a) even with the same
> > > > > settings. Which behaviors do we expect?
> > > > >
> > > >
> > > > Yeah, this is an important thing to decide.  I don't think that the
> > > > conclusion you are drawing is correct because it that is true then the
> > > > same applies to the current autovacuum work division where we divide
> > > > the cost_limit among workers but the cost_delay is same (see
> > > > autovac_balance_cost).  Basically, if we consider the delay time of
> > > > each worker independently, then it would appear that a parallel vacuum
> > > > delay with approach (b) is more, but that is true only if the workers
> > > > run serially which is not true.
> > > >
> > > > > I thought the vacuum delay for
> > > > > parallel vacuum should work as if it's a single process vacuum as we
> > > > > did for memory usage. I might be missing something. If we prefer
> > > > > approach(b) I should change the patch so that the leader process
> > > > > divides the cost limit evenly.
> > > > >
> > > >
> > > > I am also not completely sure which approach is better but I slightly
> > > > lean towards approach (b).
> > >
> > > Can we get the same sleep time as approach (b) if we divide the cost
> > > limit by the number of workers and have the shared cost balance (i.e.
> > > approach (a) with dividing the cost limit)? Currently the approach (b)
> > > seems better but I'm concerned that it might unnecessarily delay
> > > vacuum if some indexes are very small or bulk-deletions of indexes
> > > does almost nothing such as brin.
> >
> > Are you worried that some of the workers might not have much I/O to do
> > but still we divide the cost limit equally?
>
> Yes.
>
> > If that is the case then
> > that is the case with the auto vacuum workers also right?
>
> I think It is not right because we rebalance the cost after an
> autovacuum worker finished. So as Amit mentioned on the new thread we
> might need to make parallel vacuum workers notice to the leader once
> exited so that it can rebalance the cost.

I agree that if the auto vacuum worker finishes then we rebalance the
cost and we need to do something similar here.  And, that will be a
bit difficult to implement in parallel vacuum case.

We might need some shared memory array where we can set the worker
status as running as soon as the worker started running.  And, when a
worker exit we can set it false and we can also set some flag saying
we need cost rebalancing.  And, in vacuum_delay_point if we identify
that we need to rebalance then we can process the shared memory array
and find out how many workers are running and based on that we can
rebalance.  Having said that I think for rebalancing we just need a
shared memory counter that how many workers are running.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh
Date:
Hi
I took all attached patches(v32-01 to v32-4) and one Dilip's patch from "Questions/Observations related to Gist vacuum" mail thread. On the top of all these patches, I created one more patch to test parallel vacuum functionally for all existence test suite.
For reference, I am attaching patch.

What does this patch?
As we know that if we give parallel option with vacuum, then only we are vacuuming using parallel workers. So to test, I used existence guc force_parallel_mode and tested parallel vacuuming.

If force_parallel_mode is set as regress, then if parallel option is not given with vacuum, I am forcing to use parallel workers for vacuum. If there is only one index and parallel degree is not given with vacuum(or parallel option is not given), and force_parallel_mode = regress, then I am launching one parallel worker(I am not doing work by leader in this case), but if there is more than one index, then i am using leader as a worker for one index and launching workers for all other indexes.

After applying this patch and setting force_parallel_mode = regress, all test cases are passing (make-check world)

I have some questions regarding my patch. Should we do vacuuming using parallel workers even if force_parallel_mode is set as on, or we should use new GUC to test parallel worker vacuum for existence test suite?

Please let me know your thoughts for this patch.

Thanks and Regards
Mahendra Thalor

On Tue, 29 Oct 2019 at 12:37, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Oct 28, 2019 at 2:13 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Oct 24, 2019 at 4:33 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Oct 24, 2019 at 4:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > > I am thinking if we can write the patch for both the approaches (a.
> > > > > > compute shared costs and try to delay based on that, b. try to divide
> > > > > > the I/O cost among workers as described in the email above[1]) and do
> > > > > > some tests to see the behavior of throttling, that might help us in
> > > > > > deciding what is the best strategy to solve this problem, if any.
> > > > > > What do you think?
> > > > >
> > > > > I agree with this idea.  I can come up with a POC patch for approach
> > > > > (b).  Meanwhile, if someone is interested to quickly hack with the
> > > > > approach (a) then we can do some testing and compare.  Sawada-san,
> > > > > by any chance will you be interested to write POC with approach (a)?
> > > > > Otherwise, I will try to write it after finishing the first one
> > > > > (approach b).
> > > > >
> > > > I have come up with the POC for approach (a).
>
> > > Can we compute the overall throttling (sleep time) in the operation
> > > separately for heap and index, then divide the index's sleep_time with
> > > a number of workers and add it to heap's sleep time?  Then, it will be
> > > a bit easier to compare the data between parallel and non-parallel
> > > case.
> I have come up with a patch to compute the total delay during the
> vacuum.  So the idea of computing the total cost delay is
>
> Total cost delay = Total dealy of heap scan + Total dealy of
> index/worker;  Patch is attached for the same.
>
> I have prepared this patch on the latest patch of the parallel
> vacuum[1].  I have also rebased the patch for the approach [b] for
> dividing the vacuum cost limit and done some testing for computing the
> I/O throttling.  Attached patches 0001-POC-compute-total-cost-delay
> and 0002-POC-divide-vacuum-cost-limit can be applied on top of
> v31-0005-Add-paralell-P-option-to-vacuumdb-command.patch.  I haven't
> rebased on top of v31-0006, because v31-0006 is implementing the I/O
> throttling with one approach and 0002-POC-divide-vacuum-cost-limit is
> doing the same with another approach.   But,
> 0001-POC-compute-total-cost-delay can be applied on top of v31-0006 as
> well (just 1-2 lines conflict).
>
> Testing:  I have performed 2 tests, one with the same size indexes and
> second with the different size indexes and measured total I/O delay
> with the attached patch.
>
> Setup:
> VacuumCostDelay=10ms
> VacuumCostLimit=2000
>
> Test1 (Same size index):
> create table test(a int, b varchar, c varchar);
> create index idx1 on test(a);
> create index idx2 on test(b);
> create index idx3 on test(c);
> insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> generate_series(1,500000) as i;
> delete from test where a < 200000;
>
>                       Vacuum (Head)                   Parallel Vacuum
>            Vacuum Cost Divide Patch
> Total Delay        1784 (ms)                           1398(ms)
>                  1938(ms)
>
>
> Test2 (Variable size dead tuple in index)
> create table test(a int, b varchar, c varchar);
> create index idx1 on test(a);
> create index idx2 on test(b) where a > 100000;
> create index idx3 on test(c) where a > 150000;
>
> insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> generate_series(1,500000) as i;
> delete from test where a < 200000;
>
> Vacuum (Head)                                   Parallel Vacuum
>               Vacuum Cost Divide Patch
> Total Delay 1438 (ms)                               1029(ms)
>                    1529(ms)
>
>
> Conclusion:
> 1. The tests prove that the total I/O delay is significantly less with
> the parallel vacuum.
> 2. With the vacuum cost divide the problem is solved but the delay bit
> more compared to the non-parallel version.  The reason could be the
> problem discussed at[2], but it needs further investigation.
>
> Next, I will test with the v31-0006 (shared vacuum cost) patch.  I
> will also try to test different types of indexes.
>

Thank you for testing!

I realized that v31-0006 patch doesn't work fine so I've attached the
updated version patch that also incorporated some comments I got so
far. Sorry for the inconvenience. I'll apply your 0001 patch and also
test the total delay time.

Regards,

--
Masahiko Sawada
Attachment

Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Wed, Nov 6, 2019 at 2:01 PM Mahendra Singh <mahi6run@gmail.com> wrote:
>
> Hi
> I took all attached patches(v32-01 to v32-4) and one Dilip's patch from "Questions/Observations related to Gist
vacuum"mail thread. On the top of all these patches, I created one more patch to test parallel vacuum functionally for
allexistence test suite. 
> For reference, I am attaching patch.
>
> What does this patch?
> As we know that if we give parallel option with vacuum, then only we are vacuuming using parallel workers. So to
test,I used existence guc force_parallel_mode and tested parallel vacuuming. 
>
> If force_parallel_mode is set as regress, then if parallel option is not given with vacuum, I am forcing to use
parallelworkers for vacuum. If there is only one index and parallel degree is not given with vacuum(or parallel option
isnot given), and force_parallel_mode = regress, then I am launching one parallel worker(I am not doing work by leader
inthis case), but if there is more than one index, then i am using leader as a worker for one index and launching
workersfor all other indexes. 
>
> After applying this patch and setting force_parallel_mode = regress, all test cases are passing (make-check world)
>
> I have some questions regarding my patch. Should we do vacuuming using parallel workers even if force_parallel_mode
isset as on, or we should use new GUC to test parallel worker vacuum for existence test suite? 

IMHO, with force_parallel_mode=on we don't need to do anything here
because that is useful for normal query parallelism where if the user
thinks that the parallel plan should have been selected by the planer
but planer did not select the parallel plan then the user can force
and check.  But, vacuum parallelism is itself forced by the user so
there is no point in doing it with force_parallel_mode=on.   However,
force_parallel_mode=regress is useful for testing the vacuum with an
existing test suit.

>
> Please let me know your thoughts for this patch.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Wed, 6 Nov 2019 at 18:42, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Nov 6, 2019 at 2:01 PM Mahendra Singh <mahi6run@gmail.com> wrote:
> >
> > Hi
> > I took all attached patches(v32-01 to v32-4) and one Dilip's patch from "Questions/Observations related to Gist
vacuum"mail thread. On the top of all these patches, I created one more patch to test parallel vacuum functionally for
allexistence test suite. 

Thank you for looking at this patch!

> > For reference, I am attaching patch.
> >
> > What does this patch?
> > As we know that if we give parallel option with vacuum, then only we are vacuuming using parallel workers. So to
test,I used existence guc force_parallel_mode and tested parallel vacuuming. 
> >
> > If force_parallel_mode is set as regress, then if parallel option is not given with vacuum, I am forcing to use
parallelworkers for vacuum. If there is only one index and parallel degree is not given with vacuum(or parallel option
isnot given), and force_parallel_mode = regress, then I am launching one parallel worker(I am not doing work by leader
inthis case), but if there is more than one index, then i am using leader as a worker for one index and launching
workersfor all other indexes. 
> >
> > After applying this patch and setting force_parallel_mode = regress, all test cases are passing (make-check world)
> >
> > I have some questions regarding my patch. Should we do vacuuming using parallel workers even if force_parallel_mode
isset as on, or we should use new GUC to test parallel worker vacuum for existence test suite? 
>
> IMHO, with force_parallel_mode=on we don't need to do anything here
> because that is useful for normal query parallelism where if the user
> thinks that the parallel plan should have been selected by the planer
> but planer did not select the parallel plan then the user can force
> and check.  But, vacuum parallelism is itself forced by the user so
> there is no point in doing it with force_parallel_mode=on.

Yeah I think so too. force_parallel_mode is a planner parameter and
parallel vacuum can be forced by vacuum option.

>  However,
> force_parallel_mode=regress is useful for testing the vacuum with an
> existing test suit.

If we want to control the leader participation by GUC parameter I
think we would need to have another GUC parameter rather than using
force_parallel_mode. And it's useful if we can use the parameter for
parallel CREATE INDEX as well. But it should be a separate patch.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Wed, Nov 6, 2019 at 3:50 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Wed, 6 Nov 2019 at 18:42, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Nov 6, 2019 at 2:01 PM Mahendra Singh <mahi6run@gmail.com> wrote:
> > >
> > > Hi
> > > I took all attached patches(v32-01 to v32-4) and one Dilip's patch from "Questions/Observations related to Gist
vacuum"mail thread. On the top of all these patches, I created one more patch to test parallel vacuum functionally for
allexistence test suite. 
>
> Thank you for looking at this patch!
>
> > > For reference, I am attaching patch.
> > >
> > > What does this patch?
> > > As we know that if we give parallel option with vacuum, then only we are vacuuming using parallel workers. So to
test,I used existence guc force_parallel_mode and tested parallel vacuuming. 
> > >
> > > If force_parallel_mode is set as regress, then if parallel option is not given with vacuum, I am forcing to use
parallelworkers for vacuum. If there is only one index and parallel degree is not given with vacuum(or parallel option
isnot given), and force_parallel_mode = regress, then I am launching one parallel worker(I am not doing work by leader
inthis case), but if there is more than one index, then i am using leader as a worker for one index and launching
workersfor all other indexes. 
> > >
> > > After applying this patch and setting force_parallel_mode = regress, all test cases are passing (make-check
world)
> > >
> > > I have some questions regarding my patch. Should we do vacuuming using parallel workers even if
force_parallel_modeis set as on, or we should use new GUC to test parallel worker vacuum for existence test suite? 
> >
> > IMHO, with force_parallel_mode=on we don't need to do anything here
> > because that is useful for normal query parallelism where if the user
> > thinks that the parallel plan should have been selected by the planer
> > but planer did not select the parallel plan then the user can force
> > and check.  But, vacuum parallelism is itself forced by the user so
> > there is no point in doing it with force_parallel_mode=on.
>
> Yeah I think so too. force_parallel_mode is a planner parameter and
> parallel vacuum can be forced by vacuum option.
>
> >  However,
> > force_parallel_mode=regress is useful for testing the vacuum with an
> > existing test suit.
>
> If we want to control the leader participation by GUC parameter I
> think we would need to have another GUC parameter rather than using
> force_parallel_mode.
I think the purpose is not to disable the leader participation,
instead, I think the purpose of 'force_parallel_mode=regress' is that
without changing the existing test suit we can execute the existing
vacuum commands in the test suit with the worker.  I did not study the
patch but the idea should be that if "force_parallel_mode=regress"
then normal vacuum command should be executed in parallel by using 1
worker.

 And it's useful if we can use the parameter for
> parallel CREATE INDEX as well. But it should be a separate patch.
>

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Wed, 6 Nov 2019 at 20:29, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Nov 6, 2019 at 3:50 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Wed, 6 Nov 2019 at 18:42, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Wed, Nov 6, 2019 at 2:01 PM Mahendra Singh <mahi6run@gmail.com> wrote:
> > > >
> > > > Hi
> > > > I took all attached patches(v32-01 to v32-4) and one Dilip's patch from "Questions/Observations related to Gist
vacuum"mail thread. On the top of all these patches, I created one more patch to test parallel vacuum functionally for
allexistence test suite. 
> >
> > Thank you for looking at this patch!
> >
> > > > For reference, I am attaching patch.
> > > >
> > > > What does this patch?
> > > > As we know that if we give parallel option with vacuum, then only we are vacuuming using parallel workers. So
totest, I used existence guc force_parallel_mode and tested parallel vacuuming. 
> > > >
> > > > If force_parallel_mode is set as regress, then if parallel option is not given with vacuum, I am forcing to use
parallelworkers for vacuum. If there is only one index and parallel degree is not given with vacuum(or parallel option
isnot given), and force_parallel_mode = regress, then I am launching one parallel worker(I am not doing work by leader
inthis case), but if there is more than one index, then i am using leader as a worker for one index and launching
workersfor all other indexes. 
> > > >
> > > > After applying this patch and setting force_parallel_mode = regress, all test cases are passing (make-check
world)
> > > >
> > > > I have some questions regarding my patch. Should we do vacuuming using parallel workers even if
force_parallel_modeis set as on, or we should use new GUC to test parallel worker vacuum for existence test suite? 
> > >
> > > IMHO, with force_parallel_mode=on we don't need to do anything here
> > > because that is useful for normal query parallelism where if the user
> > > thinks that the parallel plan should have been selected by the planer
> > > but planer did not select the parallel plan then the user can force
> > > and check.  But, vacuum parallelism is itself forced by the user so
> > > there is no point in doing it with force_parallel_mode=on.
> >
> > Yeah I think so too. force_parallel_mode is a planner parameter and
> > parallel vacuum can be forced by vacuum option.
> >
> > >  However,
> > > force_parallel_mode=regress is useful for testing the vacuum with an
> > > existing test suit.
> >
> > If we want to control the leader participation by GUC parameter I
> > think we would need to have another GUC parameter rather than using
> > force_parallel_mode.
> I think the purpose is not to disable the leader participation,
> instead, I think the purpose of 'force_parallel_mode=regress' is that
> without changing the existing test suit we can execute the existing
> vacuum commands in the test suit with the worker.  I did not study the
> patch but the idea should be that if "force_parallel_mode=regress"
> then normal vacuum command should be executed in parallel by using 1
> worker.

Oh I got it. Considering the current parallel vacuum design I'm not
sure that we can cover more test cases by forcing parallel vacuum
during existing test suite because most of these would be tables with
several indexes and one index vacuum cycle. It might be better to add
more test cases for parallel vacuum.

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Wed, 6 Nov 2019, 20:07 Masahiko Sawada, <masahiko.sawada@2ndquadrant.com> wrote:
On Wed, 6 Nov 2019 at 20:29, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Nov 6, 2019 at 3:50 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Wed, 6 Nov 2019 at 18:42, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Wed, Nov 6, 2019 at 2:01 PM Mahendra Singh <mahi6run@gmail.com> wrote:
> > > >
> > > > Hi
> > > > I took all attached patches(v32-01 to v32-4) and one Dilip's patch from "Questions/Observations related to Gist vacuum" mail thread. On the top of all these patches, I created one more patch to test parallel vacuum functionally for all existence test suite.
> >
> > Thank you for looking at this patch!
> >
> > > > For reference, I am attaching patch.
> > > >
> > > > What does this patch?
> > > > As we know that if we give parallel option with vacuum, then only we are vacuuming using parallel workers. So to test, I used existence guc force_parallel_mode and tested parallel vacuuming.
> > > >
> > > > If force_parallel_mode is set as regress, then if parallel option is not given with vacuum, I am forcing to use parallel workers for vacuum. If there is only one index and parallel degree is not given with vacuum(or parallel option is not given), and force_parallel_mode = regress, then I am launching one parallel worker(I am not doing work by leader in this case), but if there is more than one index, then i am using leader as a worker for one index and launching workers for all other indexes.
> > > >
> > > > After applying this patch and setting force_parallel_mode = regress, all test cases are passing (make-check world)
> > > >
> > > > I have some questions regarding my patch. Should we do vacuuming using parallel workers even if force_parallel_mode is set as on, or we should use new GUC to test parallel worker vacuum for existence test suite?
> > >
> > > IMHO, with force_parallel_mode=on we don't need to do anything here
> > > because that is useful for normal query parallelism where if the user
> > > thinks that the parallel plan should have been selected by the planer
> > > but planer did not select the parallel plan then the user can force
> > > and check.  But, vacuum parallelism is itself forced by the user so
> > > there is no point in doing it with force_parallel_mode=on.
> >
> > Yeah I think so too. force_parallel_mode is a planner parameter and
> > parallel vacuum can be forced by vacuum option.
> >
> > >  However,
> > > force_parallel_mode=regress is useful for testing the vacuum with an
> > > existing test suit.
> >
> > If we want to control the leader participation by GUC parameter I
> > think we would need to have another GUC parameter rather than using
> > force_parallel_mode.
> I think the purpose is not to disable the leader participation,
> instead, I think the purpose of 'force_parallel_mode=regress' is that
> without changing the existing test suit we can execute the existing
> vacuum commands in the test suit with the worker.  I did not study the
> patch but the idea should be that if "force_parallel_mode=regress"
> then normal vacuum command should be executed in parallel by using 1
> worker.

Oh I got it. Considering the current parallel vacuum design I'm not
sure that we can cover more test cases by forcing parallel vacuum
during existing test suite because most of these would be tables with
several indexes and one index vacuum cycle.
Oh sure, but still it would be good to get them tested with the parallel vacuum.
 
It might be better to add
more test cases for parallel vacuum.

 I agree that it would be good to add additional test cases.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> I realized that v31-0006 patch doesn't work fine so I've attached the
> updated version patch that also incorporated some comments I got so
> far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> test the total delay time.
>

+ /*
+ * Generally index cleanup does not scan the index when index
+ * vacuuming (ambulkdelete) was already performed.  So we perform
+ * index cleanup with parallel workers only if we have not
+ * performed index vacuuming yet.  Otherwise, we do it in the
+ * leader process alone.
+ */
+ if (vacrelstats->num_index_scans == 0)
+ lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
+ stats, lps);

Today, I was thinking about this point where this check will work for
most cases, but still, exceptions are there like for brin index, the
main work is done in amvacuumcleanup function.  Similarly, I think
there are few more indexes like gin, bloom where it seems we take
another pass over-index in the amvacuumcleanup phase.  Don't you think
we should try to allow parallel workers for such cases?  If so, I
don't have any great ideas on how to do that, but what comes to my
mind is to indicate that via stats (
IndexBulkDeleteResult) or via an indexam API.  I am not sure if it is
acceptable to have indexam API for this.

Thoughts?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh
Date:
Thanks Masahiko san and Dilip for looking into this patch.

In previous patch, when 'force_parallel_mode=regress', I was doing all the vacuum using multiple workers but we should do all the vacuuming using only 1 worker(leader should not participate in vacuuming). So attaching patch for same.

What does this patch?
If 'force_parallel_mode=regress' and parallel option is not given with vacuum, then all the vacuuming work will be done by one single worker and leader will not participate.  But if parallel option is given with vacuum, then preference will be given to specified degree.

After applying this patch, all the test cases are passing(make check-world) and I can't see any improvement in code coverage with this patch.

Please let me know your thoughts for this patch.

Thanks and Regards
Mahendra Thalor


On Wed, 6 Nov 2019 at 16:59, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Nov 6, 2019 at 3:50 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Wed, 6 Nov 2019 at 18:42, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Nov 6, 2019 at 2:01 PM Mahendra Singh <mahi6run@gmail.com> wrote:
> > >
> > > Hi
> > > I took all attached patches(v32-01 to v32-4) and one Dilip's patch from "Questions/Observations related to Gist vacuum" mail thread. On the top of all these patches, I created one more patch to test parallel vacuum functionally for all existence test suite.
>
> Thank you for looking at this patch!
>
> > > For reference, I am attaching patch.
> > >
> > > What does this patch?
> > > As we know that if we give parallel option with vacuum, then only we are vacuuming using parallel workers. So to test, I used existence guc force_parallel_mode and tested parallel vacuuming.
> > >
> > > If force_parallel_mode is set as regress, then if parallel option is not given with vacuum, I am forcing to use parallel workers for vacuum. If there is only one index and parallel degree is not given with vacuum(or parallel option is not given), and force_parallel_mode = regress, then I am launching one parallel worker(I am not doing work by leader in this case), but if there is more than one index, then i am using leader as a worker for one index and launching workers for all other indexes.
> > >
> > > After applying this patch and setting force_parallel_mode = regress, all test cases are passing (make-check world)
> > >
> > > I have some questions regarding my patch. Should we do vacuuming using parallel workers even if force_parallel_mode is set as on, or we should use new GUC to test parallel worker vacuum for existence test suite?
> >
> > IMHO, with force_parallel_mode=on we don't need to do anything here
> > because that is useful for normal query parallelism where if the user
> > thinks that the parallel plan should have been selected by the planer
> > but planer did not select the parallel plan then the user can force
> > and check.  But, vacuum parallelism is itself forced by the user so
> > there is no point in doing it with force_parallel_mode=on.
>
> Yeah I think so too. force_parallel_mode is a planner parameter and
> parallel vacuum can be forced by vacuum option.
>
> >  However,
> > force_parallel_mode=regress is useful for testing the vacuum with an
> > existing test suit.
>
> If we want to control the leader participation by GUC parameter I
> think we would need to have another GUC parameter rather than using
> force_parallel_mode.
I think the purpose is not to disable the leader participation,
instead, I think the purpose of 'force_parallel_mode=regress' is that
without changing the existing test suit we can execute the existing
vacuum commands in the test suit with the worker.  I did not study the
patch but the idea should be that if "force_parallel_mode=regress"
then normal vacuum command should be executed in parallel by using 1
worker.

 And it's useful if we can use the parameter for
> parallel CREATE INDEX as well. But it should be a separate patch.
>

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachment

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Fri, 8 Nov 2019 at 18:48, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > I realized that v31-0006 patch doesn't work fine so I've attached the
> > updated version patch that also incorporated some comments I got so
> > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > test the total delay time.
> >
>
> + /*
> + * Generally index cleanup does not scan the index when index
> + * vacuuming (ambulkdelete) was already performed.  So we perform
> + * index cleanup with parallel workers only if we have not
> + * performed index vacuuming yet.  Otherwise, we do it in the
> + * leader process alone.
> + */
> + if (vacrelstats->num_index_scans == 0)
> + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
> + stats, lps);
>
> Today, I was thinking about this point where this check will work for
> most cases, but still, exceptions are there like for brin index, the
> main work is done in amvacuumcleanup function.  Similarly, I think
> there are few more indexes like gin, bloom where it seems we take
> another pass over-index in the amvacuumcleanup phase.  Don't you think
> we should try to allow parallel workers for such cases?  If so, I
> don't have any great ideas on how to do that, but what comes to my
> mind is to indicate that via stats (
> IndexBulkDeleteResult) or via an indexam API.  I am not sure if it is
> acceptable to have indexam API for this.
>
> Thoughts?

Good point. gin and bloom do a certain heavy work during cleanup and
during bulkdelete as you mentioned. Brin does it during cleanup, and
hash and gist do it during bulkdelete. There are three types of index
AM just inside postgres code. An idea I came up with is that we can
control parallel vacuum and parallel cleanup separately.  That is,
adding a variable amcanparallelcleanup and we can do parallel cleanup
on only indexes of which amcanparallelcleanup is true. IndexBulkDelete
can be stored locally if both amcanparallelvacuum and
amcanparallelcleanup of an index are false because only the leader
process deals with such indexes. Otherwise we need to store it in DSM
as always.

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Mon, Nov 11, 2019 at 9:57 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Fri, 8 Nov 2019 at 18:48, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > I realized that v31-0006 patch doesn't work fine so I've attached the
> > > updated version patch that also incorporated some comments I got so
> > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > > test the total delay time.
> > >
> >
> > + /*
> > + * Generally index cleanup does not scan the index when index
> > + * vacuuming (ambulkdelete) was already performed.  So we perform
> > + * index cleanup with parallel workers only if we have not
> > + * performed index vacuuming yet.  Otherwise, we do it in the
> > + * leader process alone.
> > + */
> > + if (vacrelstats->num_index_scans == 0)
> > + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
> > + stats, lps);
> >
> > Today, I was thinking about this point where this check will work for
> > most cases, but still, exceptions are there like for brin index, the
> > main work is done in amvacuumcleanup function.  Similarly, I think
> > there are few more indexes like gin, bloom where it seems we take
> > another pass over-index in the amvacuumcleanup phase.  Don't you think
> > we should try to allow parallel workers for such cases?  If so, I
> > don't have any great ideas on how to do that, but what comes to my
> > mind is to indicate that via stats (
> > IndexBulkDeleteResult) or via an indexam API.  I am not sure if it is
> > acceptable to have indexam API for this.
> >
> > Thoughts?
>
> Good point. gin and bloom do a certain heavy work during cleanup and
> during bulkdelete as you mentioned. Brin does it during cleanup, and
> hash and gist do it during bulkdelete. There are three types of index
> AM just inside postgres code. An idea I came up with is that we can
> control parallel vacuum and parallel cleanup separately.  That is,
> adding a variable amcanparallelcleanup and we can do parallel cleanup
> on only indexes of which amcanparallelcleanup is true. IndexBulkDelete
> can be stored locally if both amcanparallelvacuum and
> amcanparallelcleanup of an index are false because only the leader
> process deals with such indexes. Otherwise we need to store it in DSM
> as always.
>
IIUC,  amcanparallelcleanup will be true for those indexes which does
heavy work during cleanup irrespective of whether bulkdelete is called
or not e.g. gin? If so, along with an amcanparallelcleanup flag, we
need to consider vacrelstats->num_index_scans right? So if
vacrelstats->num_index_scans == 0 then we need to launch parallel
worker for all the indexes who support amcanparallelvacuum and if
vacrelstats->num_index_scans > 0 then only for those who has
amcanparallelcleanup as true.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Mon, 11 Nov 2019 at 15:06, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Nov 11, 2019 at 9:57 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Fri, 8 Nov 2019 at 18:48, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > I realized that v31-0006 patch doesn't work fine so I've attached the
> > > > updated version patch that also incorporated some comments I got so
> > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > > > test the total delay time.
> > > >
> > >
> > > + /*
> > > + * Generally index cleanup does not scan the index when index
> > > + * vacuuming (ambulkdelete) was already performed.  So we perform
> > > + * index cleanup with parallel workers only if we have not
> > > + * performed index vacuuming yet.  Otherwise, we do it in the
> > > + * leader process alone.
> > > + */
> > > + if (vacrelstats->num_index_scans == 0)
> > > + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
> > > + stats, lps);
> > >
> > > Today, I was thinking about this point where this check will work for
> > > most cases, but still, exceptions are there like for brin index, the
> > > main work is done in amvacuumcleanup function.  Similarly, I think
> > > there are few more indexes like gin, bloom where it seems we take
> > > another pass over-index in the amvacuumcleanup phase.  Don't you think
> > > we should try to allow parallel workers for such cases?  If so, I
> > > don't have any great ideas on how to do that, but what comes to my
> > > mind is to indicate that via stats (
> > > IndexBulkDeleteResult) or via an indexam API.  I am not sure if it is
> > > acceptable to have indexam API for this.
> > >
> > > Thoughts?
> >
> > Good point. gin and bloom do a certain heavy work during cleanup and
> > during bulkdelete as you mentioned. Brin does it during cleanup, and
> > hash and gist do it during bulkdelete. There are three types of index
> > AM just inside postgres code. An idea I came up with is that we can
> > control parallel vacuum and parallel cleanup separately.  That is,
> > adding a variable amcanparallelcleanup and we can do parallel cleanup
> > on only indexes of which amcanparallelcleanup is true. IndexBulkDelete
> > can be stored locally if both amcanparallelvacuum and
> > amcanparallelcleanup of an index are false because only the leader
> > process deals with such indexes. Otherwise we need to store it in DSM
> > as always.
> >
> IIUC,  amcanparallelcleanup will be true for those indexes which does
> heavy work during cleanup irrespective of whether bulkdelete is called
> or not e.g. gin?

Yes, I guess that gin and brin set amcanparallelcleanup to true (gin
might set amcanparallevacuum to true as well).

>  If so, along with an amcanparallelcleanup flag, we
> need to consider vacrelstats->num_index_scans right? So if
> vacrelstats->num_index_scans == 0 then we need to launch parallel
> worker for all the indexes who support amcanparallelvacuum and if
> vacrelstats->num_index_scans > 0 then only for those who has
> amcanparallelcleanup as true.

Yes, you're right. But this won't work fine for brin indexes who don't
want to participate in parallel vacuum but always want to participate
in parallel cleanup.

After more thoughts, I think we can have a ternary value: never,
always, once. If it's 'never' the index never participates in parallel
cleanup. I guess hash indexes use 'never'. Next, if it's 'always' the
index always participates regardless of vacrelstats->num_index_scan. I
guess gin, brin and bloom use 'always'. Finally if it's 'once' the
index participates in parallel cleanup only when it's the first time
(that is, vacrelstats->num_index_scan == 0), I guess btree, gist and
spgist use 'once'.

Regards,


--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Mon, Nov 11, 2019 at 12:26 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Mon, 11 Nov 2019 at 15:06, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Mon, Nov 11, 2019 at 9:57 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Fri, 8 Nov 2019 at 18:48, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > >
> > > > > I realized that v31-0006 patch doesn't work fine so I've attached the
> > > > > updated version patch that also incorporated some comments I got so
> > > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > > > > test the total delay time.
> > > > >
> > > >
> > > > + /*
> > > > + * Generally index cleanup does not scan the index when index
> > > > + * vacuuming (ambulkdelete) was already performed.  So we perform
> > > > + * index cleanup with parallel workers only if we have not
> > > > + * performed index vacuuming yet.  Otherwise, we do it in the
> > > > + * leader process alone.
> > > > + */
> > > > + if (vacrelstats->num_index_scans == 0)
> > > > + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
> > > > + stats, lps);
> > > >
> > > > Today, I was thinking about this point where this check will work for
> > > > most cases, but still, exceptions are there like for brin index, the
> > > > main work is done in amvacuumcleanup function.  Similarly, I think
> > > > there are few more indexes like gin, bloom where it seems we take
> > > > another pass over-index in the amvacuumcleanup phase.  Don't you think
> > > > we should try to allow parallel workers for such cases?  If so, I
> > > > don't have any great ideas on how to do that, but what comes to my
> > > > mind is to indicate that via stats (
> > > > IndexBulkDeleteResult) or via an indexam API.  I am not sure if it is
> > > > acceptable to have indexam API for this.
> > > >
> > > > Thoughts?
> > >
> > > Good point. gin and bloom do a certain heavy work during cleanup and
> > > during bulkdelete as you mentioned. Brin does it during cleanup, and
> > > hash and gist do it during bulkdelete. There are three types of index
> > > AM just inside postgres code. An idea I came up with is that we can
> > > control parallel vacuum and parallel cleanup separately.  That is,
> > > adding a variable amcanparallelcleanup and we can do parallel cleanup
> > > on only indexes of which amcanparallelcleanup is true. IndexBulkDelete
> > > can be stored locally if both amcanparallelvacuum and
> > > amcanparallelcleanup of an index are false because only the leader
> > > process deals with such indexes. Otherwise we need to store it in DSM
> > > as always.
> > >
> > IIUC,  amcanparallelcleanup will be true for those indexes which does
> > heavy work during cleanup irrespective of whether bulkdelete is called
> > or not e.g. gin?
>
> Yes, I guess that gin and brin set amcanparallelcleanup to true (gin
> might set amcanparallevacuum to true as well).
>
> >  If so, along with an amcanparallelcleanup flag, we
> > need to consider vacrelstats->num_index_scans right? So if
> > vacrelstats->num_index_scans == 0 then we need to launch parallel
> > worker for all the indexes who support amcanparallelvacuum and if
> > vacrelstats->num_index_scans > 0 then only for those who has
> > amcanparallelcleanup as true.
>
> Yes, you're right. But this won't work fine for brin indexes who don't
> want to participate in parallel vacuum but always want to participate
> in parallel cleanup.
Yeah, right.
>
> After more thoughts, I think we can have a ternary value: never,
> always, once. If it's 'never' the index never participates in parallel
> cleanup. I guess hash indexes use 'never'. Next, if it's 'always' the
> index always participates regardless of vacrelstats->num_index_scan. I
> guess gin, brin and bloom use 'always'. Finally if it's 'once' the
> index participates in parallel cleanup only when it's the first time
> (that is, vacrelstats->num_index_scan == 0), I guess btree, gist and
> spgist use 'once'.
Yeah, this make sense to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I realized that v31-0006 patch doesn't work fine so I've attached the
> updated version patch that also incorporated some comments I got so
> far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> test the total delay time.
>
While reviewing the 0002, I got one doubt related to how we are
dividing the maintainance_work_mem

+prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
+{
+ /* Compute the new maitenance_work_mem value for index vacuuming */
+ lvshared->maintenance_work_mem_worker =
+ (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm :
maintenance_work_mem;
+}
Is it fair to just consider the number of indexes which use
maintenance_work_mem?  Or we need to consider the number of worker as
well.  My point is suppose there are 10 indexes which will use the
maintenance_work_mem but we are launching just 2 workers then what is
the point in dividing the maintenance_work_mem by 10.

IMHO the calculation should be like this
lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ?
maintenance_work_mem / Min(nindexes_mwm, nworkers)  :
maintenance_work_mem;

Am I missing something?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh
Date:
Hi All,
I did some performance testing with the help of Dilip to test normal vacuum and parallel vacuum. Below is the test summary-

Configuration settings:
autovacuum = off
shared_buffers = 2GB
max_parallel_maintenance_workers = 6

Test  1: (table has 4 index on all tuples and deleting alternative tuples)
create table test(a int, b int, c int, d int, e int, f int, g int, h int); create index i1 on test (a); create index i2 on test (b); create index i3 on test (c); create index i4 on test (d); insert into test select i,i,i,i,i,i,i,i from generate_series(1,1000000) as i; delete from test where a %2=0;

case 1: (run normal vacuum)
vacuum test;
1019.453 ms

Case 2: (run vacuum with 1 parallel degree)
vacuum (parallel 1) test;
765.366 ms

Case 3:(run vacuum with 3 parallel degree)
vacuum (parallel 3) test;
555.227 ms

From above results, we can concluded that with the help of parallel vacuum, performance is increased for large indexes.

Test 2:(table has 16 indexes and indexes are small , deleting alternative tuples)
create table test(a int, b int, c int, d int, e int, f int, g int, h int);
create index i1 on test (a) where a < 100000;
create index i2 on test (a) where a > 100000 and a < 200000;
create index i3 on test (a) where a > 200000 and a < 300000;
create index i4 on test (a) where a > 300000 and a < 400000;
create index i5 on test (a) where a > 400000 and a < 500000;
create index i6 on test (a) where a > 500000 and a < 600000;
create index i7 on test (b) where a < 100000;
create index i8 on test (c) where a < 100000;
create index i9 on test (d) where a < 100000;
create index i10 on test (d) where a < 100000;
create index i11 on test (d) where a < 100000;
create index i12 on test (d) where a < 100000;
create index i13 on test (d) where a < 100000;
create index i14 on test (d) where a < 100000;
create index i15 on test (d) where a < 100000;
create index i16 on test (d) where a < 100000;
insert into test select i,i,i,i,i,i,i,i from generate_series(1,1000000) as i;
delete from test where a %2=0;

case 1: (run normal vacuum)
vacuum test;
649.187 ms

Case 2: (run vacuum with 1 parallel degree)
vacuum (parallel 1) test;
492.075 ms

Case 3:(run vacuum with 3 parallel degree)
vacuum (parallel 3) test;
435.581 ms

For small indexes also, we gained some performance by parallel vacuum.

I will continue my testing for stats collection.

Please let me know, if anybody has any suggestion for other testing(What should be tested).

Thanks and Regards
Mahendra Thalor

On Tue, 29 Oct 2019 at 12:37, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Oct 28, 2019 at 2:13 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Oct 24, 2019 at 4:33 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Oct 24, 2019 at 4:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > > I am thinking if we can write the patch for both the approaches (a.
> > > > > > compute shared costs and try to delay based on that, b. try to divide
> > > > > > the I/O cost among workers as described in the email above[1]) and do
> > > > > > some tests to see the behavior of throttling, that might help us in
> > > > > > deciding what is the best strategy to solve this problem, if any.
> > > > > > What do you think?
> > > > >
> > > > > I agree with this idea.  I can come up with a POC patch for approach
> > > > > (b).  Meanwhile, if someone is interested to quickly hack with the
> > > > > approach (a) then we can do some testing and compare.  Sawada-san,
> > > > > by any chance will you be interested to write POC with approach (a)?
> > > > > Otherwise, I will try to write it after finishing the first one
> > > > > (approach b).
> > > > >
> > > > I have come up with the POC for approach (a).
>
> > > Can we compute the overall throttling (sleep time) in the operation
> > > separately for heap and index, then divide the index's sleep_time with
> > > a number of workers and add it to heap's sleep time?  Then, it will be
> > > a bit easier to compare the data between parallel and non-parallel
> > > case.
> I have come up with a patch to compute the total delay during the
> vacuum.  So the idea of computing the total cost delay is
>
> Total cost delay = Total dealy of heap scan + Total dealy of
> index/worker;  Patch is attached for the same.
>
> I have prepared this patch on the latest patch of the parallel
> vacuum[1].  I have also rebased the patch for the approach [b] for
> dividing the vacuum cost limit and done some testing for computing the
> I/O throttling.  Attached patches 0001-POC-compute-total-cost-delay
> and 0002-POC-divide-vacuum-cost-limit can be applied on top of
> v31-0005-Add-paralell-P-option-to-vacuumdb-command.patch.  I haven't
> rebased on top of v31-0006, because v31-0006 is implementing the I/O
> throttling with one approach and 0002-POC-divide-vacuum-cost-limit is
> doing the same with another approach.   But,
> 0001-POC-compute-total-cost-delay can be applied on top of v31-0006 as
> well (just 1-2 lines conflict).
>
> Testing:  I have performed 2 tests, one with the same size indexes and
> second with the different size indexes and measured total I/O delay
> with the attached patch.
>
> Setup:
> VacuumCostDelay=10ms
> VacuumCostLimit=2000
>
> Test1 (Same size index):
> create table test(a int, b varchar, c varchar);
> create index idx1 on test(a);
> create index idx2 on test(b);
> create index idx3 on test(c);
> insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> generate_series(1,500000) as i;
> delete from test where a < 200000;
>
>                       Vacuum (Head)                   Parallel Vacuum
>            Vacuum Cost Divide Patch
> Total Delay        1784 (ms)                           1398(ms)
>                  1938(ms)
>
>
> Test2 (Variable size dead tuple in index)
> create table test(a int, b varchar, c varchar);
> create index idx1 on test(a);
> create index idx2 on test(b) where a > 100000;
> create index idx3 on test(c) where a > 150000;
>
> insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> generate_series(1,500000) as i;
> delete from test where a < 200000;
>
> Vacuum (Head)                                   Parallel Vacuum
>               Vacuum Cost Divide Patch
> Total Delay 1438 (ms)                               1029(ms)
>                    1529(ms)
>
>
> Conclusion:
> 1. The tests prove that the total I/O delay is significantly less with
> the parallel vacuum.
> 2. With the vacuum cost divide the problem is solved but the delay bit
> more compared to the non-parallel version.  The reason could be the
> problem discussed at[2], but it needs further investigation.
>
> Next, I will test with the v31-0006 (shared vacuum cost) patch.  I
> will also try to test different types of indexes.
>

Thank you for testing!

I realized that v31-0006 patch doesn't work fine so I've attached the
updated version patch that also incorporated some comments I got so
far. Sorry for the inconvenience. I'll apply your 0001 patch and also
test the total delay time.

Regards,

--
Masahiko Sawada

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Mon, Nov 11, 2019 at 12:26 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Mon, 11 Nov 2019 at 15:06, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Mon, Nov 11, 2019 at 9:57 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > Good point. gin and bloom do a certain heavy work during cleanup and
> > > during bulkdelete as you mentioned. Brin does it during cleanup, and
> > > hash and gist do it during bulkdelete. There are three types of index
> > > AM just inside postgres code. An idea I came up with is that we can
> > > control parallel vacuum and parallel cleanup separately.  That is,
> > > adding a variable amcanparallelcleanup and we can do parallel cleanup
> > > on only indexes of which amcanparallelcleanup is true.
> > >

This is what I mentioned in my email as a second option (whether to
expose via IndexAM).  I am not sure if we can have a new variable just
for this.

> > > IndexBulkDelete
> > > can be stored locally if both amcanparallelvacuum and
> > > amcanparallelcleanup of an index are false because only the leader
> > > process deals with such indexes. Otherwise we need to store it in DSM
> > > as always.
> > >
> > IIUC,  amcanparallelcleanup will be true for those indexes which does
> > heavy work during cleanup irrespective of whether bulkdelete is called
> > or not e.g. gin?
>
> Yes, I guess that gin and brin set amcanparallelcleanup to true (gin
> might set amcanparallevacuum to true as well).
>
> >  If so, along with an amcanparallelcleanup flag, we
> > need to consider vacrelstats->num_index_scans right? So if
> > vacrelstats->num_index_scans == 0 then we need to launch parallel
> > worker for all the indexes who support amcanparallelvacuum and if
> > vacrelstats->num_index_scans > 0 then only for those who has
> > amcanparallelcleanup as true.
>
> Yes, you're right. But this won't work fine for brin indexes who don't
> want to participate in parallel vacuum but always want to participate
> in parallel cleanup.
>
> After more thoughts, I think we can have a ternary value: never,
> always, once. If it's 'never' the index never participates in parallel
> cleanup. I guess hash indexes use 'never'. Next, if it's 'always' the
> index always participates regardless of vacrelstats->num_index_scan. I
> guess gin, brin and bloom use 'always'. Finally if it's 'once' the
> index participates in parallel cleanup only when it's the first time
> (that is, vacrelstats->num_index_scan == 0), I guess btree, gist and
> spgist use 'once'.
>

I think this 'once' option is confusing especially because it also
depends on 'num_index_scans' which the IndexAM has no control over.
It might be that the option name is not good, but I am not sure.
Another thing is that for brin indexes, we don't want bulkdelete to
participate in parallelism.  Do we want to have separate variables for
ambulkdelete and amvacuumcleanup which decides whether the particular
phase can be done in parallel?  Another possibility could be to just
have one variable (say uint16 amparallelvacuum) which will tell us all
the options but I don't think that will be a popular approach
considering all the other methods and variables exposed.  What do you
think?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Mon, Nov 11, 2019 at 2:53 PM Mahendra Singh <mahi6run@gmail.com> wrote:
>
>
> For small indexes also, we gained some performance by parallel vacuum.
>

Thanks for doing all these tests.  It is clear with this and previous
tests that this patch has benefit in wide variety of cases.  However,
we should try to see some worst cases as well.  For example, if there
are multiple indexes on a table and only one of them is large whereas
all others are very small say having a few 100 or 1000 rows.

Note: Please don't use the top-posting style to reply.  Here, we use
inline reply.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Mon, 11 Nov 2019 at 19:29, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Nov 11, 2019 at 12:26 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Mon, 11 Nov 2019 at 15:06, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Mon, Nov 11, 2019 at 9:57 AM Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > Good point. gin and bloom do a certain heavy work during cleanup and
> > > > during bulkdelete as you mentioned. Brin does it during cleanup, and
> > > > hash and gist do it during bulkdelete. There are three types of index
> > > > AM just inside postgres code. An idea I came up with is that we can
> > > > control parallel vacuum and parallel cleanup separately.  That is,
> > > > adding a variable amcanparallelcleanup and we can do parallel cleanup
> > > > on only indexes of which amcanparallelcleanup is true.
> > > >
>
> This is what I mentioned in my email as a second option (whether to
> expose via IndexAM).  I am not sure if we can have a new variable just
> for this.
>
> > > > IndexBulkDelete
> > > > can be stored locally if both amcanparallelvacuum and
> > > > amcanparallelcleanup of an index are false because only the leader
> > > > process deals with such indexes. Otherwise we need to store it in DSM
> > > > as always.
> > > >
> > > IIUC,  amcanparallelcleanup will be true for those indexes which does
> > > heavy work during cleanup irrespective of whether bulkdelete is called
> > > or not e.g. gin?
> >
> > Yes, I guess that gin and brin set amcanparallelcleanup to true (gin
> > might set amcanparallevacuum to true as well).
> >
> > >  If so, along with an amcanparallelcleanup flag, we
> > > need to consider vacrelstats->num_index_scans right? So if
> > > vacrelstats->num_index_scans == 0 then we need to launch parallel
> > > worker for all the indexes who support amcanparallelvacuum and if
> > > vacrelstats->num_index_scans > 0 then only for those who has
> > > amcanparallelcleanup as true.
> >
> > Yes, you're right. But this won't work fine for brin indexes who don't
> > want to participate in parallel vacuum but always want to participate
> > in parallel cleanup.
> >
> > After more thoughts, I think we can have a ternary value: never,
> > always, once. If it's 'never' the index never participates in parallel
> > cleanup. I guess hash indexes use 'never'. Next, if it's 'always' the
> > index always participates regardless of vacrelstats->num_index_scan. I
> > guess gin, brin and bloom use 'always'. Finally if it's 'once' the
> > index participates in parallel cleanup only when it's the first time
> > (that is, vacrelstats->num_index_scan == 0), I guess btree, gist and
> > spgist use 'once'.
> >
>
> I think this 'once' option is confusing especially because it also
> depends on 'num_index_scans' which the IndexAM has no control over.
> It might be that the option name is not good, but I am not sure.
> Another thing is that for brin indexes, we don't want bulkdelete to
> participate in parallelism.

I thought brin should set amcanparallelvacuum is false and
amcanparallelcleanup is 'always'.

> Do we want to have separate variables for
> ambulkdelete and amvacuumcleanup which decides whether the particular
> phase can be done in parallel?

You mean adding variables to ambulkdelete and amvacuumcleanup as
function arguments? If so isn't it too late to tell the leader whether
the particular pchase can be done in parallel?

> Another possibility could be to just
> have one variable (say uint16 amparallelvacuum) which will tell us all
> the options but I don't think that will be a popular approach
> considering all the other methods and variables exposed.  What do you
> think?

Adding only one variable that can have flags would also be a good
idea, instead of having multiple variables for each option. For
instance FDW API uses such interface (see eflags of BeginForeignScan).

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Tue, Nov 12, 2019 at 7:43 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Mon, 11 Nov 2019 at 19:29, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Nov 11, 2019 at 12:26 PM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > After more thoughts, I think we can have a ternary value: never,
> > > always, once. If it's 'never' the index never participates in parallel
> > > cleanup. I guess hash indexes use 'never'. Next, if it's 'always' the
> > > index always participates regardless of vacrelstats->num_index_scan. I
> > > guess gin, brin and bloom use 'always'. Finally if it's 'once' the
> > > index participates in parallel cleanup only when it's the first time
> > > (that is, vacrelstats->num_index_scan == 0), I guess btree, gist and
> > > spgist use 'once'.
> > >
> >
> > I think this 'once' option is confusing especially because it also
> > depends on 'num_index_scans' which the IndexAM has no control over.
> > It might be that the option name is not good, but I am not sure.
> > Another thing is that for brin indexes, we don't want bulkdelete to
> > participate in parallelism.
>
> I thought brin should set amcanparallelvacuum is false and
> amcanparallelcleanup is 'always'.
>

In that case, it is better to name the variable as amcanparallelbulkdelete.

> > Do we want to have separate variables for
> > ambulkdelete and amvacuumcleanup which decides whether the particular
> > phase can be done in parallel?
>
> You mean adding variables to ambulkdelete and amvacuumcleanup as
> function arguments?
>

No, I mean separate variables amcanparallelbulkdelete (bool) and
amcanparallelvacuumcleanup (unit16) variables.

>
> > Another possibility could be to just
> > have one variable (say uint16 amparallelvacuum) which will tell us all
> > the options but I don't think that will be a popular approach
> > considering all the other methods and variables exposed.  What do you
> > think?
>
> Adding only one variable that can have flags would also be a good
> idea, instead of having multiple variables for each option. For
> instance FDW API uses such interface (see eflags of BeginForeignScan).
>

Yeah, maybe something like amparallelvacuumoptions.  The options can be:

VACUUM_OPTION_NO_PARALLEL   0 # vacuum (neither bulkdelete nor
vacuumcleanup) can't be performed in parallel
VACUUM_OPTION_NO_PARALLEL_CLEANUP  1 # vacuumcleanup cannot be
performed in parallel (hash index will set this flag)
VACUUM_OPTION_PARALLEL_BULKDEL   2 # bulkdelete can be done in
parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
flag)
VACUUM_OPTION_PARALLEL_COND_CLEANUP  3 # vacuumcleanup can be done in
parallel if bulkdelete is not performed (Indexes nbtree, brin, hash,
gin, gist, spgist, bloom will set this flag)
VACUUM_OPTION_PARALLEL_CLEANUP  4 # vacuumcleanup can be done in
parallel even if bulkdelete is already performed (Indexes gin, brin,
and bloom will set this flag)

Does something like this make sense?   If we all agree on this, then I
think we can summarize the part of the discussion related to this API
and get feedback from a broader audience.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Nov 12, 2019 at 7:43 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Mon, 11 Nov 2019 at 19:29, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Mon, Nov 11, 2019 at 12:26 PM Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > After more thoughts, I think we can have a ternary value: never,
> > > > always, once. If it's 'never' the index never participates in parallel
> > > > cleanup. I guess hash indexes use 'never'. Next, if it's 'always' the
> > > > index always participates regardless of vacrelstats->num_index_scan. I
> > > > guess gin, brin and bloom use 'always'. Finally if it's 'once' the
> > > > index participates in parallel cleanup only when it's the first time
> > > > (that is, vacrelstats->num_index_scan == 0), I guess btree, gist and
> > > > spgist use 'once'.
> > > >
> > >
> > > I think this 'once' option is confusing especially because it also
> > > depends on 'num_index_scans' which the IndexAM has no control over.
> > > It might be that the option name is not good, but I am not sure.
> > > Another thing is that for brin indexes, we don't want bulkdelete to
> > > participate in parallelism.
> >
> > I thought brin should set amcanparallelvacuum is false and
> > amcanparallelcleanup is 'always'.
> >
>
> In that case, it is better to name the variable as amcanparallelbulkdelete.
>
> > > Do we want to have separate variables for
> > > ambulkdelete and amvacuumcleanup which decides whether the particular
> > > phase can be done in parallel?
> >
> > You mean adding variables to ambulkdelete and amvacuumcleanup as
> > function arguments?
> >
>
> No, I mean separate variables amcanparallelbulkdelete (bool) and
> amcanparallelvacuumcleanup (unit16) variables.
>
> >
> > > Another possibility could be to just
> > > have one variable (say uint16 amparallelvacuum) which will tell us all
> > > the options but I don't think that will be a popular approach
> > > considering all the other methods and variables exposed.  What do you
> > > think?
> >
> > Adding only one variable that can have flags would also be a good
> > idea, instead of having multiple variables for each option. For
> > instance FDW API uses such interface (see eflags of BeginForeignScan).
> >
>
> Yeah, maybe something like amparallelvacuumoptions.  The options can be:
>
> VACUUM_OPTION_NO_PARALLEL   0 # vacuum (neither bulkdelete nor
> vacuumcleanup) can't be performed in parallel
> VACUUM_OPTION_NO_PARALLEL_CLEANUP  1 # vacuumcleanup cannot be
> performed in parallel (hash index will set this flag)

Maybe we don't want this option?  because if 3 or 4 is not set then we
will not do the cleanup in parallel right?

> VACUUM_OPTION_PARALLEL_BULKDEL   2 # bulkdelete can be done in
> parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> flag)
> VACUUM_OPTION_PARALLEL_COND_CLEANUP  3 # vacuumcleanup can be done in
> parallel if bulkdelete is not performed (Indexes nbtree, brin, hash,
> gin, gist, spgist, bloom will set this flag)
> VACUUM_OPTION_PARALLEL_CLEANUP  4 # vacuumcleanup can be done in
> parallel even if bulkdelete is already performed (Indexes gin, brin,
> and bloom will set this flag)
>
> Does something like this make sense?
Yeah, something like that seems better to me.

> If we all agree on this, then I
> think we can summarize the part of the discussion related to this API
> and get feedback from a broader audience.

Make sense.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh
Date:
On Mon, 11 Nov 2019 at 16:36, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Nov 11, 2019 at 2:53 PM Mahendra Singh <mahi6run@gmail.com> wrote:
> >
> >
> > For small indexes also, we gained some performance by parallel vacuum.
> >
>
> Thanks for doing all these tests.  It is clear with this and previous
> tests that this patch has benefit in wide variety of cases.  However,
> we should try to see some worst cases as well.  For example, if there
> are multiple indexes on a table and only one of them is large whereas
> all others are very small say having a few 100 or 1000 rows.
>

Thanks Amit for your comments.

I did some testing on the above suggested lines. Below is the summary:
Test case:(I created 16 indexes but only 1 index is large, other are very small)
create table test(a int, b int, c int, d int, e int, f int, g int, h int);
create index i3 on test (a) where a > 2000 and a < 3000;
create index i4 on test (a) where a > 3000 and a < 4000;
create index i5 on test (a) where a > 4000 and a < 5000;
create index i6 on test (a) where a > 5000 and a < 6000;
create index i7 on test (b) where a < 1000;
create index i8 on test (c) where a < 1000;
create index i9 on test (d) where a < 1000;
create index i10 on test (d) where a < 1000;
create index i11 on test (d) where a < 1000;
create index i12 on test (d) where a < 1000;
create index i13 on test (d) where a < 1000;
create index i14 on test (d) where a < 1000;
create index i15 on test (d) where a < 1000;
create index i16 on test (d) where a < 1000;
insert into test select i,i,i,i,i,i,i,i from generate_series(1,1000000) as i;
delete from test where a %2=0;

case 1: vacuum without using parallel workers.
vacuum test;
228.259 ms

case 2: vacuum with 1 parallel worker.
vacuum (parallel 1) test;
251.725 ms

case 3: vacuum with 3 parallel workers.
vacuum (parallel 3) test;
259.986

From above results, it seems that if indexes are small, then parallel vacuum is not beneficial as compared to normal vacuum.

> Note: Please don't use the top-posting style to reply.  Here, we use
> inline reply.

Okay. I will follow inline reply.

Thanks and Regards
Mahendra Thalor

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, 12 Nov 2019 at 18:26, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Nov 12, 2019 at 7:43 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Mon, 11 Nov 2019 at 19:29, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Mon, Nov 11, 2019 at 12:26 PM Masahiko Sawada
> > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > > >
> > > > > After more thoughts, I think we can have a ternary value: never,
> > > > > always, once. If it's 'never' the index never participates in parallel
> > > > > cleanup. I guess hash indexes use 'never'. Next, if it's 'always' the
> > > > > index always participates regardless of vacrelstats->num_index_scan. I
> > > > > guess gin, brin and bloom use 'always'. Finally if it's 'once' the
> > > > > index participates in parallel cleanup only when it's the first time
> > > > > (that is, vacrelstats->num_index_scan == 0), I guess btree, gist and
> > > > > spgist use 'once'.
> > > > >
> > > >
> > > > I think this 'once' option is confusing especially because it also
> > > > depends on 'num_index_scans' which the IndexAM has no control over.
> > > > It might be that the option name is not good, but I am not sure.
> > > > Another thing is that for brin indexes, we don't want bulkdelete to
> > > > participate in parallelism.
> > >
> > > I thought brin should set amcanparallelvacuum is false and
> > > amcanparallelcleanup is 'always'.
> > >
> >
> > In that case, it is better to name the variable as amcanparallelbulkdelete.
> >
> > > > Do we want to have separate variables for
> > > > ambulkdelete and amvacuumcleanup which decides whether the particular
> > > > phase can be done in parallel?
> > >
> > > You mean adding variables to ambulkdelete and amvacuumcleanup as
> > > function arguments?
> > >
> >
> > No, I mean separate variables amcanparallelbulkdelete (bool) and
> > amcanparallelvacuumcleanup (unit16) variables.
> >
> > >
> > > > Another possibility could be to just
> > > > have one variable (say uint16 amparallelvacuum) which will tell us all
> > > > the options but I don't think that will be a popular approach
> > > > considering all the other methods and variables exposed.  What do you
> > > > think?
> > >
> > > Adding only one variable that can have flags would also be a good
> > > idea, instead of having multiple variables for each option. For
> > > instance FDW API uses such interface (see eflags of BeginForeignScan).
> > >
> >
> > Yeah, maybe something like amparallelvacuumoptions.  The options can be:
> >
> > VACUUM_OPTION_NO_PARALLEL   0 # vacuum (neither bulkdelete nor
> > vacuumcleanup) can't be performed in parallel
> > VACUUM_OPTION_NO_PARALLEL_CLEANUP  1 # vacuumcleanup cannot be
> > performed in parallel (hash index will set this flag)
>
> Maybe we don't want this option?  because if 3 or 4 is not set then we
> will not do the cleanup in parallel right?
>
> > VACUUM_OPTION_PARALLEL_BULKDEL   2 # bulkdelete can be done in
> > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> > flag)
> > VACUUM_OPTION_PARALLEL_COND_CLEANUP  3 # vacuumcleanup can be done in
> > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash,
> > gin, gist, spgist, bloom will set this flag)
> > VACUUM_OPTION_PARALLEL_CLEANUP  4 # vacuumcleanup can be done in
> > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > and bloom will set this flag)
> >
> > Does something like this make sense?

3 and 4 confused me because 4 also looks conditional. How about having
two flags instead: one for doing parallel cleanup when not performed
yet (VACUUM_OPTION_PARALLEL_COND_CLEANUP) and another one for doing
always parallel cleanup (VACUUM_OPTION_PARALLEL_CLEANUP)? That way, we
can have flags as follows and index AM chooses two flags, one from the
first two flags for bulk deletion and another from next three flags
for cleanup.

VACUUM_OPTION_PARALLEL_NO_BULKDEL 1 << 0
VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1
VACUUM_OPTION_PARALLEL_NO_CLEANUP 1 << 2
VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 3
VACUUM_OPTION_PARALLEL_CLEANUP 1 << 4

> Yeah, something like that seems better to me.
>
> > If we all agree on this, then I
> > think we can summarize the part of the discussion related to this API
> > and get feedback from a broader audience.
>
> Make sense.

+1

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Mon, 11 Nov 2019 at 17:57, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > I realized that v31-0006 patch doesn't work fine so I've attached the
> > updated version patch that also incorporated some comments I got so
> > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > test the total delay time.
> >
> While reviewing the 0002, I got one doubt related to how we are
> dividing the maintainance_work_mem
>
> +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
> +{
> + /* Compute the new maitenance_work_mem value for index vacuuming */
> + lvshared->maintenance_work_mem_worker =
> + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm :
> maintenance_work_mem;
> +}
> Is it fair to just consider the number of indexes which use
> maintenance_work_mem?  Or we need to consider the number of worker as
> well.  My point is suppose there are 10 indexes which will use the
> maintenance_work_mem but we are launching just 2 workers then what is
> the point in dividing the maintenance_work_mem by 10.
>
> IMHO the calculation should be like this
> lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ?
> maintenance_work_mem / Min(nindexes_mwm, nworkers)  :
> maintenance_work_mem;
>
> Am I missing something?

No, I think you're right. On the other hand I think that dividing it
by the number of indexes that will use the mantenance_work_mem makes
sense when parallel degree > the number of such indexes. Suppose the
table has 2 indexes and there are 10 workers then we should divide the
maintenance_work_mem by 2 rather than 10 because it's possible that at
most 2 indexes that uses the maintenance_work_mem are processed in
parallel at a time.

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Tue, Nov 12, 2019 at 3:39 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Tue, 12 Nov 2019 at 18:26, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > Yeah, maybe something like amparallelvacuumoptions.  The options can be:
> > >
> > > VACUUM_OPTION_NO_PARALLEL   0 # vacuum (neither bulkdelete nor
> > > vacuumcleanup) can't be performed in parallel
> > > VACUUM_OPTION_NO_PARALLEL_CLEANUP  1 # vacuumcleanup cannot be
> > > performed in parallel (hash index will set this flag)
> >
> > Maybe we don't want this option?  because if 3 or 4 is not set then we
> > will not do the cleanup in parallel right?
> >

Yeah, but it is better to be explicit about this.

> > > VACUUM_OPTION_PARALLEL_BULKDEL   2 # bulkdelete can be done in
> > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> > > flag)
> > > VACUUM_OPTION_PARALLEL_COND_CLEANUP  3 # vacuumcleanup can be done in
> > > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash,
> > > gin, gist, spgist, bloom will set this flag)
> > > VACUUM_OPTION_PARALLEL_CLEANUP  4 # vacuumcleanup can be done in
> > > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > > and bloom will set this flag)
> > >
> > > Does something like this make sense?
>
> 3 and 4 confused me because 4 also looks conditional. How about having
> two flags instead: one for doing parallel cleanup when not performed
> yet (VACUUM_OPTION_PARALLEL_COND_CLEANUP) and another one for doing
> always parallel cleanup (VACUUM_OPTION_PARALLEL_CLEANUP)?
>

Hmm, this is exactly what I intend to say with 3 and 4.  I am not sure
what makes you think 4 is conditional.

> That way, we
> can have flags as follows and index AM chooses two flags, one from the
> first two flags for bulk deletion and another from next three flags
> for cleanup.
>
> VACUUM_OPTION_PARALLEL_NO_BULKDEL 1 << 0
> VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1
> VACUUM_OPTION_PARALLEL_NO_CLEANUP 1 << 2
> VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 3
> VACUUM_OPTION_PARALLEL_CLEANUP 1 << 4
>

This also looks reasonable, but if there is an index that doesn't want
to support a parallel vacuum, it needs to set multiple flags.

> > Yeah, something like that seems better to me.
> >
> > > If we all agree on this, then I
> > > think we can summarize the part of the discussion related to this API
> > > and get feedback from a broader audience.
> >
> > Make sense.
>
> +1
>

Okay, then I will write a separate email for this topic.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Mon, 11 Nov 2019 at 17:57, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > I realized that v31-0006 patch doesn't work fine so I've attached the
> > > updated version patch that also incorporated some comments I got so
> > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > > test the total delay time.
> > >
> > While reviewing the 0002, I got one doubt related to how we are
> > dividing the maintainance_work_mem
> >
> > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
> > +{
> > + /* Compute the new maitenance_work_mem value for index vacuuming */
> > + lvshared->maintenance_work_mem_worker =
> > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm :
> > maintenance_work_mem;
> > +}
> > Is it fair to just consider the number of indexes which use
> > maintenance_work_mem?  Or we need to consider the number of worker as
> > well.  My point is suppose there are 10 indexes which will use the
> > maintenance_work_mem but we are launching just 2 workers then what is
> > the point in dividing the maintenance_work_mem by 10.
> >
> > IMHO the calculation should be like this
> > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ?
> > maintenance_work_mem / Min(nindexes_mwm, nworkers)  :
> > maintenance_work_mem;
> >
> > Am I missing something?
>
> No, I think you're right. On the other hand I think that dividing it
> by the number of indexes that will use the mantenance_work_mem makes
> sense when parallel degree > the number of such indexes. Suppose the
> table has 2 indexes and there are 10 workers then we should divide the
> maintenance_work_mem by 2 rather than 10 because it's possible that at
> most 2 indexes that uses the maintenance_work_mem are processed in
> parallel at a time.
>
Right, thats the reason I suggested divide with Min(nindexes_mwm, nworkers).


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, 12 Nov 2019 at 20:11, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Nov 12, 2019 at 3:39 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Tue, 12 Nov 2019 at 18:26, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > Yeah, maybe something like amparallelvacuumoptions.  The options can be:
> > > >
> > > > VACUUM_OPTION_NO_PARALLEL   0 # vacuum (neither bulkdelete nor
> > > > vacuumcleanup) can't be performed in parallel
> > > > VACUUM_OPTION_NO_PARALLEL_CLEANUP  1 # vacuumcleanup cannot be
> > > > performed in parallel (hash index will set this flag)
> > >
> > > Maybe we don't want this option?  because if 3 or 4 is not set then we
> > > will not do the cleanup in parallel right?
> > >
>
> Yeah, but it is better to be explicit about this.

VACUUM_OPTION_NO_PARALLEL_BULKDEL is missing? I think brin indexes
will use this flag. It will end up with
(VACUUM_OPTION_NO_PARALLEL_CLEANUP |
VACUUM_OPTION_NO_PARALLEL_BULKDEL) is equivalent to
VACUUM_OPTION_NO_PARALLEL, though.

>
> > > > VACUUM_OPTION_PARALLEL_BULKDEL   2 # bulkdelete can be done in
> > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> > > > flag)
> > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP  3 # vacuumcleanup can be done in
> > > > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash,
> > > > gin, gist, spgist, bloom will set this flag)
> > > > VACUUM_OPTION_PARALLEL_CLEANUP  4 # vacuumcleanup can be done in
> > > > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > > > and bloom will set this flag)
> > > >
> > > > Does something like this make sense?
> >
> > 3 and 4 confused me because 4 also looks conditional. How about having
> > two flags instead: one for doing parallel cleanup when not performed
> > yet (VACUUM_OPTION_PARALLEL_COND_CLEANUP) and another one for doing
> > always parallel cleanup (VACUUM_OPTION_PARALLEL_CLEANUP)?
> >
>
> Hmm, this is exactly what I intend to say with 3 and 4.  I am not sure
> what makes you think 4 is conditional.

Hmm so why gin and bloom will set 3 and 4 flags? I thought if it sets
4 it doesn't need to set 3 because 4 means always doing cleanup in
parallel.

>
> > That way, we
> > can have flags as follows and index AM chooses two flags, one from the
> > first two flags for bulk deletion and another from next three flags
> > for cleanup.
> >
> > VACUUM_OPTION_PARALLEL_NO_BULKDEL 1 << 0
> > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1
> > VACUUM_OPTION_PARALLEL_NO_CLEANUP 1 << 2
> > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 3
> > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 4
> >
>
> This also looks reasonable, but if there is an index that doesn't want
> to support a parallel vacuum, it needs to set multiple flags.

Right. It would be better to use uint16 as two uint8. I mean that if
first 8 bits are 0 it means VACUUM_OPTION_PARALLEL_NO_BULKDEL and if
next 8 bits are 0 means VACUUM_OPTION_PARALLEL_NO_CLEANUP. Other flags
could be followings:

VACUUM_OPTION_PARALLEL_BULKDEL 0x0001
VACUUM_OPTION_PARALLEL_COND_CLEANUP 0x0100
VACUUM_OPTION_PARALLEL_CLEANUP 0x0200

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, 12 Nov 2019 at 20:29, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Mon, 11 Nov 2019 at 17:57, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > I realized that v31-0006 patch doesn't work fine so I've attached the
> > > > updated version patch that also incorporated some comments I got so
> > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > > > test the total delay time.
> > > >
> > > While reviewing the 0002, I got one doubt related to how we are
> > > dividing the maintainance_work_mem
> > >
> > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
> > > +{
> > > + /* Compute the new maitenance_work_mem value for index vacuuming */
> > > + lvshared->maintenance_work_mem_worker =
> > > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm :
> > > maintenance_work_mem;
> > > +}
> > > Is it fair to just consider the number of indexes which use
> > > maintenance_work_mem?  Or we need to consider the number of worker as
> > > well.  My point is suppose there are 10 indexes which will use the
> > > maintenance_work_mem but we are launching just 2 workers then what is
> > > the point in dividing the maintenance_work_mem by 10.
> > >
> > > IMHO the calculation should be like this
> > > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ?
> > > maintenance_work_mem / Min(nindexes_mwm, nworkers)  :
> > > maintenance_work_mem;
> > >
> > > Am I missing something?
> >
> > No, I think you're right. On the other hand I think that dividing it
> > by the number of indexes that will use the mantenance_work_mem makes
> > sense when parallel degree > the number of such indexes. Suppose the
> > table has 2 indexes and there are 10 workers then we should divide the
> > maintenance_work_mem by 2 rather than 10 because it's possible that at
> > most 2 indexes that uses the maintenance_work_mem are processed in
> > parallel at a time.
> >
> Right, thats the reason I suggested divide with Min(nindexes_mwm, nworkers).

Thanks! I'll fix it in the next version patch.

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Tue, Nov 12, 2019 at 5:30 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Tue, 12 Nov 2019 at 20:11, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Nov 12, 2019 at 3:39 PM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Tue, 12 Nov 2019 at 18:26, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > Yeah, maybe something like amparallelvacuumoptions.  The options can be:
> > > > >
> > > > > VACUUM_OPTION_NO_PARALLEL   0 # vacuum (neither bulkdelete nor
> > > > > vacuumcleanup) can't be performed in parallel
> > > > > VACUUM_OPTION_NO_PARALLEL_CLEANUP  1 # vacuumcleanup cannot be
> > > > > performed in parallel (hash index will set this flag)
> > > >
> > > > Maybe we don't want this option?  because if 3 or 4 is not set then we
> > > > will not do the cleanup in parallel right?
> > > >
> >
> > Yeah, but it is better to be explicit about this.
>
> VACUUM_OPTION_NO_PARALLEL_BULKDEL is missing?
>

I am not sure if that is required.

> I think brin indexes
> will use this flag.
>

Brin index can set VACUUM_OPTION_PARALLEL_CLEANUP in my proposal and
it should work.

> It will end up with
> (VACUUM_OPTION_NO_PARALLEL_CLEANUP |
> VACUUM_OPTION_NO_PARALLEL_BULKDEL) is equivalent to
> VACUUM_OPTION_NO_PARALLEL, though.
>
> >
> > > > > VACUUM_OPTION_PARALLEL_BULKDEL   2 # bulkdelete can be done in
> > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> > > > > flag)
> > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP  3 # vacuumcleanup can be done in
> > > > > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash,
> > > > > gin, gist, spgist, bloom will set this flag)
> > > > > VACUUM_OPTION_PARALLEL_CLEANUP  4 # vacuumcleanup can be done in
> > > > > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > > > > and bloom will set this flag)
> > > > >
> > > > > Does something like this make sense?
> > >
> > > 3 and 4 confused me because 4 also looks conditional. How about having
> > > two flags instead: one for doing parallel cleanup when not performed
> > > yet (VACUUM_OPTION_PARALLEL_COND_CLEANUP) and another one for doing
> > > always parallel cleanup (VACUUM_OPTION_PARALLEL_CLEANUP)?
> > >
> >
> > Hmm, this is exactly what I intend to say with 3 and 4.  I am not sure
> > what makes you think 4 is conditional.
>
> Hmm so why gin and bloom will set 3 and 4 flags? I thought if it sets
> 4 it doesn't need to set 3 because 4 means always doing cleanup in
> parallel.
>

Yeah, that makes sense.  They can just set 4.

> >
> > > That way, we
> > > can have flags as follows and index AM chooses two flags, one from the
> > > first two flags for bulk deletion and another from next three flags
> > > for cleanup.
> > >
> > > VACUUM_OPTION_PARALLEL_NO_BULKDEL 1 << 0
> > > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1
> > > VACUUM_OPTION_PARALLEL_NO_CLEANUP 1 << 2
> > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 3
> > > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 4
> > >
> >
> > This also looks reasonable, but if there is an index that doesn't want
> > to support a parallel vacuum, it needs to set multiple flags.
>
> Right. It would be better to use uint16 as two uint8. I mean that if
> first 8 bits are 0 it means VACUUM_OPTION_PARALLEL_NO_BULKDEL and if
> next 8 bits are 0 means VACUUM_OPTION_PARALLEL_NO_CLEANUP. Other flags
> could be followings:
>
> VACUUM_OPTION_PARALLEL_BULKDEL 0x0001
> VACUUM_OPTION_PARALLEL_COND_CLEANUP 0x0100
> VACUUM_OPTION_PARALLEL_CLEANUP 0x0200
>

Hmm, I think we should define these flags in the most simple way.
Your previous proposal sounds okay to me.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, 12 Nov 2019 at 22:33, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Nov 12, 2019 at 5:30 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Tue, 12 Nov 2019 at 20:11, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Nov 12, 2019 at 3:39 PM Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > On Tue, 12 Nov 2019 at 18:26, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > > Yeah, maybe something like amparallelvacuumoptions.  The options can be:
> > > > > >
> > > > > > VACUUM_OPTION_NO_PARALLEL   0 # vacuum (neither bulkdelete nor
> > > > > > vacuumcleanup) can't be performed in parallel
> > > > > > VACUUM_OPTION_NO_PARALLEL_CLEANUP  1 # vacuumcleanup cannot be
> > > > > > performed in parallel (hash index will set this flag)
> > > > >
> > > > > Maybe we don't want this option?  because if 3 or 4 is not set then we
> > > > > will not do the cleanup in parallel right?
> > > > >
> > >
> > > Yeah, but it is better to be explicit about this.
> >
> > VACUUM_OPTION_NO_PARALLEL_BULKDEL is missing?
> >
>
> I am not sure if that is required.
>
> > I think brin indexes
> > will use this flag.
> >
>
> Brin index can set VACUUM_OPTION_PARALLEL_CLEANUP in my proposal and
> it should work.
>
> > It will end up with
> > (VACUUM_OPTION_NO_PARALLEL_CLEANUP |
> > VACUUM_OPTION_NO_PARALLEL_BULKDEL) is equivalent to
> > VACUUM_OPTION_NO_PARALLEL, though.
> >
> > >
> > > > > > VACUUM_OPTION_PARALLEL_BULKDEL   2 # bulkdelete can be done in
> > > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> > > > > > flag)
> > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP  3 # vacuumcleanup can be done in
> > > > > > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash,
> > > > > > gin, gist, spgist, bloom will set this flag)
> > > > > > VACUUM_OPTION_PARALLEL_CLEANUP  4 # vacuumcleanup can be done in
> > > > > > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > > > > > and bloom will set this flag)
> > > > > >
> > > > > > Does something like this make sense?
> > > >
> > > > 3 and 4 confused me because 4 also looks conditional. How about having
> > > > two flags instead: one for doing parallel cleanup when not performed
> > > > yet (VACUUM_OPTION_PARALLEL_COND_CLEANUP) and another one for doing
> > > > always parallel cleanup (VACUUM_OPTION_PARALLEL_CLEANUP)?
> > > >
> > >
> > > Hmm, this is exactly what I intend to say with 3 and 4.  I am not sure
> > > what makes you think 4 is conditional.
> >
> > Hmm so why gin and bloom will set 3 and 4 flags? I thought if it sets
> > 4 it doesn't need to set 3 because 4 means always doing cleanup in
> > parallel.
> >
>
> Yeah, that makes sense.  They can just set 4.

Okay,

>
> > >
> > > > That way, we
> > > > can have flags as follows and index AM chooses two flags, one from the
> > > > first two flags for bulk deletion and another from next three flags
> > > > for cleanup.
> > > >
> > > > VACUUM_OPTION_PARALLEL_NO_BULKDEL 1 << 0
> > > > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1
> > > > VACUUM_OPTION_PARALLEL_NO_CLEANUP 1 << 2
> > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 3
> > > > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 4
> > > >
> > >
> > > This also looks reasonable, but if there is an index that doesn't want
> > > to support a parallel vacuum, it needs to set multiple flags.
> >
> > Right. It would be better to use uint16 as two uint8. I mean that if
> > first 8 bits are 0 it means VACUUM_OPTION_PARALLEL_NO_BULKDEL and if
> > next 8 bits are 0 means VACUUM_OPTION_PARALLEL_NO_CLEANUP. Other flags
> > could be followings:
> >
> > VACUUM_OPTION_PARALLEL_BULKDEL 0x0001
> > VACUUM_OPTION_PARALLEL_COND_CLEANUP 0x0100
> > VACUUM_OPTION_PARALLEL_CLEANUP 0x0200
> >
>
> Hmm, I think we should define these flags in the most simple way.
> Your previous proposal sounds okay to me.

Okay. As you mentioned before, my previous proposal won't work for
existing index AMs that don't set amparallelvacuumoptions. But since we
have amcanparallelvacuum which is false by default I think we don't
need to worry about backward compatibility problem. The existing index
AM will use neither parallel bulk-deletion nor parallel cleanup by
default. When it wants to support parallel vacuum they will set
amparallelvacuumoptions as well as amcanparallelvacuum.

I'll try to use my previous proposal and check it. If something wrong
we can back to your proposal or others.


--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Wed, Nov 13, 2019 at 6:53 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Tue, 12 Nov 2019 at 22:33, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > Hmm, I think we should define these flags in the most simple way.
> > Your previous proposal sounds okay to me.
>
> Okay. As you mentioned before, my previous proposal won't work for
> existing index AMs that don't set amparallelvacuumoptions.
>

You mean to say it won't work because it has to set multiple flags
which means that if IndexAm user doesn't set the value of
amparallelvacuumoptions then it won't work?

> But since we
> have amcanparallelvacuum which is false by default I think we don't
> need to worry about backward compatibility problem. The existing index
> AM will use neither parallel bulk-deletion nor parallel cleanup by
> default. When it wants to support parallel vacuum they will set
> amparallelvacuumoptions as well as amcanparallelvacuum.
>

Hmm, I was not thinking of multiple variables rather only one
variable. The default value should indicate that IndexAm doesn't
support a parallel vacuum.  It might be that we need to do it the way
I originally proposed the different values of amparallelvacuumoptions
or maybe some variant of it where the default value can clearly say
that IndexAm doesn't support a parallel vacuum.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Tue, Nov 12, 2019 at 7:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Nov 12, 2019 at 5:30 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Tue, 12 Nov 2019 at 20:11, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Nov 12, 2019 at 3:39 PM Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > On Tue, 12 Nov 2019 at 18:26, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > > Yeah, maybe something like amparallelvacuumoptions.  The options can be:
> > > > > >
> > > > > > VACUUM_OPTION_NO_PARALLEL   0 # vacuum (neither bulkdelete nor
> > > > > > vacuumcleanup) can't be performed in parallel
> > > > > > VACUUM_OPTION_NO_PARALLEL_CLEANUP  1 # vacuumcleanup cannot be
> > > > > > performed in parallel (hash index will set this flag)
> > > > >
> > > > > Maybe we don't want this option?  because if 3 or 4 is not set then we
> > > > > will not do the cleanup in parallel right?
> > > > >
> > >
> > > Yeah, but it is better to be explicit about this.
> >
> > VACUUM_OPTION_NO_PARALLEL_BULKDEL is missing?
> >
>
> I am not sure if that is required.
>
> > I think brin indexes
> > will use this flag.
> >
>
> Brin index can set VACUUM_OPTION_PARALLEL_CLEANUP in my proposal and
> it should work.

IIUC, VACUUM_OPTION_PARALLEL_CLEANUP means no parallel bulk delete and
always parallel cleanup?  I am not sure whether this is the best way
because for the cleanup option we are being explicit for each option
i.e PARALLEL_CLEANUP, NO_PARALLEL_CLEANUP, etc, then why not the same
for the bulk delete.  I mean why don't we keep both PARALLEL_BULKDEL
and NO_PARALLEL_BULKDEL?

>
> > It will end up with
> > (VACUUM_OPTION_NO_PARALLEL_CLEANUP |
> > VACUUM_OPTION_NO_PARALLEL_BULKDEL) is equivalent to
> > VACUUM_OPTION_NO_PARALLEL, though.
> >
> > >
> > > > > > VACUUM_OPTION_PARALLEL_BULKDEL   2 # bulkdelete can be done in
> > > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> > > > > > flag)
> > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP  3 # vacuumcleanup can be done in
> > > > > > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash,
> > > > > > gin, gist, spgist, bloom will set this flag)
> > > > > > VACUUM_OPTION_PARALLEL_CLEANUP  4 # vacuumcleanup can be done in
> > > > > > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > > > > > and bloom will set this flag)
> > > > > >
> > > > > > Does something like this make sense?
> > > >
> > > > 3 and 4 confused me because 4 also looks conditional. How about having
> > > > two flags instead: one for doing parallel cleanup when not performed
> > > > yet (VACUUM_OPTION_PARALLEL_COND_CLEANUP) and another one for doing
> > > > always parallel cleanup (VACUUM_OPTION_PARALLEL_CLEANUP)?
> > > >
> > >
> > > Hmm, this is exactly what I intend to say with 3 and 4.  I am not sure
> > > what makes you think 4 is conditional.
> >
> > Hmm so why gin and bloom will set 3 and 4 flags? I thought if it sets
> > 4 it doesn't need to set 3 because 4 means always doing cleanup in
> > parallel.
> >
>
> Yeah, that makes sense.  They can just set 4.
>
> > >
> > > > That way, we
> > > > can have flags as follows and index AM chooses two flags, one from the
> > > > first two flags for bulk deletion and another from next three flags
> > > > for cleanup.
> > > >
> > > > VACUUM_OPTION_PARALLEL_NO_BULKDEL 1 << 0
> > > > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1
> > > > VACUUM_OPTION_PARALLEL_NO_CLEANUP 1 << 2
> > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 3
> > > > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 4
> > > >
> > >
> > > This also looks reasonable, but if there is an index that doesn't want
> > > to support a parallel vacuum, it needs to set multiple flags.
> >
> > Right. It would be better to use uint16 as two uint8. I mean that if
> > first 8 bits are 0 it means VACUUM_OPTION_PARALLEL_NO_BULKDEL and if
> > next 8 bits are 0 means VACUUM_OPTION_PARALLEL_NO_CLEANUP. Other flags
> > could be followings:
> >
> > VACUUM_OPTION_PARALLEL_BULKDEL 0x0001
> > VACUUM_OPTION_PARALLEL_COND_CLEANUP 0x0100
> > VACUUM_OPTION_PARALLEL_CLEANUP 0x0200
> >
>
> Hmm, I think we should define these flags in the most simple way.
> Your previous proposal sounds okay to me.
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Wed, 13 Nov 2019 at 11:38, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Nov 13, 2019 at 6:53 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Tue, 12 Nov 2019 at 22:33, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > >
> > > Hmm, I think we should define these flags in the most simple way.
> > > Your previous proposal sounds okay to me.
> >
> > Okay. As you mentioned before, my previous proposal won't work for
> > existing index AMs that don't set amparallelvacuumoptions.
> >
>
> You mean to say it won't work because it has to set multiple flags
> which means that if IndexAm user doesn't set the value of
> amparallelvacuumoptions then it won't work?

Yes. In my previous proposal every index AMs need to set two flags.

>
> > But since we
> > have amcanparallelvacuum which is false by default I think we don't
> > need to worry about backward compatibility problem. The existing index
> > AM will use neither parallel bulk-deletion nor parallel cleanup by
> > default. When it wants to support parallel vacuum they will set
> > amparallelvacuumoptions as well as amcanparallelvacuum.
> >
>
> Hmm, I was not thinking of multiple variables rather only one
> variable. The default value should indicate that IndexAm doesn't
> support a parallel vacuum.

Yes.

> It might be that we need to do it the way
> I originally proposed the different values of amparallelvacuumoptions
> or maybe some variant of it where the default value can clearly say
> that IndexAm doesn't support a parallel vacuum.

Okay. After more thoughts on your original proposal, what I get
confused on your proposal is that there are two types of flags that
enable and disable options. Looking at 2, 3 and 4, it looks like all
options are disabled by default and setting these flags means to
enable them. On the other hand looking at 1, it looks like these
options are enabled by default and setting the flag means to disable
it. 0 makes sense to me. So how about having 0, 2, 3 and 4?

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Tue, Nov 12, 2019 at 5:31 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Tue, 12 Nov 2019 at 20:29, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Mon, 11 Nov 2019 at 17:57, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > > I realized that v31-0006 patch doesn't work fine so I've attached the
> > > > > updated version patch that also incorporated some comments I got so
> > > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > > > > test the total delay time.
> > > > >
> > > > While reviewing the 0002, I got one doubt related to how we are
> > > > dividing the maintainance_work_mem
> > > >
> > > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
> > > > +{
> > > > + /* Compute the new maitenance_work_mem value for index vacuuming */
> > > > + lvshared->maintenance_work_mem_worker =
> > > > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm :
> > > > maintenance_work_mem;
> > > > +}
> > > > Is it fair to just consider the number of indexes which use
> > > > maintenance_work_mem?  Or we need to consider the number of worker as
> > > > well.  My point is suppose there are 10 indexes which will use the
> > > > maintenance_work_mem but we are launching just 2 workers then what is
> > > > the point in dividing the maintenance_work_mem by 10.
> > > >
> > > > IMHO the calculation should be like this
> > > > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ?
> > > > maintenance_work_mem / Min(nindexes_mwm, nworkers)  :
> > > > maintenance_work_mem;
> > > >
> > > > Am I missing something?
> > >
> > > No, I think you're right. On the other hand I think that dividing it
> > > by the number of indexes that will use the mantenance_work_mem makes
> > > sense when parallel degree > the number of such indexes. Suppose the
> > > table has 2 indexes and there are 10 workers then we should divide the
> > > maintenance_work_mem by 2 rather than 10 because it's possible that at
> > > most 2 indexes that uses the maintenance_work_mem are processed in
> > > parallel at a time.
> > >
> > Right, thats the reason I suggested divide with Min(nindexes_mwm, nworkers).
>
> Thanks! I'll fix it in the next version patch.
>
One more comment.

+lazy_vacuum_indexes(LVRelStats *vacrelstats, Relation *Irel,
+ int nindexes, IndexBulkDeleteResult **stats,
+ LVParallelState *lps)
+{
+ ....

+ if (ParallelVacuumIsActive(lps))
+ {

+
+ lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
+ stats, lps);
+
+ }
+
+ for (idx = 0; idx < nindexes; idx++)
+ {
+ /*
+ * Skip indexes that we have already vacuumed during parallel index
+ * vacuuming.
+ */
+ if (ParallelVacuumIsActive(lps) && !IndStatsIsNull(lps->lvshared, idx))
+ continue;
+
+ lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples,
+   vacrelstats->old_live_tuples);
+ }
+}

In this function, if ParallelVacuumIsActive, we perform the parallel
vacuum for all the index for which parallel vacuum is supported and
once that is over we finish vacuuming remaining indexes for which
parallel vacuum is not supported.  But, my question is that inside
lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers
to finish their job then only we start with the sequential vacuuming
shouldn't we start that immediately as soon as the leader
participation is over in the parallel vacuum?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Wed, Nov 13, 2019 at 8:34 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Wed, 13 Nov 2019 at 11:38, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
>
> > It might be that we need to do it the way
> > I originally proposed the different values of amparallelvacuumoptions
> > or maybe some variant of it where the default value can clearly say
> > that IndexAm doesn't support a parallel vacuum.
>
> Okay. After more thoughts on your original proposal, what I get
> confused on your proposal is that there are two types of flags that
> enable and disable options. Looking at 2, 3 and 4, it looks like all
> options are disabled by default and setting these flags means to
> enable them. On the other hand looking at 1, it looks like these
> options are enabled by default and setting the flag means to disable
> it. 0 makes sense to me. So how about having 0, 2, 3 and 4?
>

Yeah, 0,2,3 and 4 sounds reasonable to me.  Earlier, Dilip also got
confused with option 1.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Wed, Nov 13, 2019 at 9:12 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Nov 12, 2019 at 5:31 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Tue, 12 Nov 2019 at 20:29, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > On Mon, 11 Nov 2019 at 17:57, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > > > I realized that v31-0006 patch doesn't work fine so I've attached the
> > > > > > updated version patch that also incorporated some comments I got so
> > > > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > > > > > test the total delay time.
> > > > > >
> > > > > While reviewing the 0002, I got one doubt related to how we are
> > > > > dividing the maintainance_work_mem
> > > > >
> > > > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
> > > > > +{
> > > > > + /* Compute the new maitenance_work_mem value for index vacuuming */
> > > > > + lvshared->maintenance_work_mem_worker =
> > > > > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm :
> > > > > maintenance_work_mem;
> > > > > +}
> > > > > Is it fair to just consider the number of indexes which use
> > > > > maintenance_work_mem?  Or we need to consider the number of worker as
> > > > > well.  My point is suppose there are 10 indexes which will use the
> > > > > maintenance_work_mem but we are launching just 2 workers then what is
> > > > > the point in dividing the maintenance_work_mem by 10.
> > > > >
> > > > > IMHO the calculation should be like this
> > > > > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ?
> > > > > maintenance_work_mem / Min(nindexes_mwm, nworkers)  :
> > > > > maintenance_work_mem;
> > > > >
> > > > > Am I missing something?
> > > >
> > > > No, I think you're right. On the other hand I think that dividing it
> > > > by the number of indexes that will use the mantenance_work_mem makes
> > > > sense when parallel degree > the number of such indexes. Suppose the
> > > > table has 2 indexes and there are 10 workers then we should divide the
> > > > maintenance_work_mem by 2 rather than 10 because it's possible that at
> > > > most 2 indexes that uses the maintenance_work_mem are processed in
> > > > parallel at a time.
> > > >
> > > Right, thats the reason I suggested divide with Min(nindexes_mwm, nworkers).
> >
> > Thanks! I'll fix it in the next version patch.
> >
> One more comment.
>
> +lazy_vacuum_indexes(LVRelStats *vacrelstats, Relation *Irel,
> + int nindexes, IndexBulkDeleteResult **stats,
> + LVParallelState *lps)
> +{
> + ....
>
> + if (ParallelVacuumIsActive(lps))
> + {
>
> +
> + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
> + stats, lps);
> +
> + }
> +
> + for (idx = 0; idx < nindexes; idx++)
> + {
> + /*
> + * Skip indexes that we have already vacuumed during parallel index
> + * vacuuming.
> + */
> + if (ParallelVacuumIsActive(lps) && !IndStatsIsNull(lps->lvshared, idx))
> + continue;
> +
> + lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples,
> +   vacrelstats->old_live_tuples);
> + }
> +}
>
> In this function, if ParallelVacuumIsActive, we perform the parallel
> vacuum for all the index for which parallel vacuum is supported and
> once that is over we finish vacuuming remaining indexes for which
> parallel vacuum is not supported.  But, my question is that inside
> lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers
> to finish their job then only we start with the sequential vacuuming
> shouldn't we start that immediately as soon as the leader
> participation is over in the parallel vacuum?
>

+ /*
+ * Since parallel workers cannot access data in temporary tables, parallel
+ * vacuum is not allowed for temporary relation.
+ */
+ if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0)
+ {
+ ereport(WARNING,
+ (errmsg("skipping vacuum on \"%s\" --- cannot vacuum temporary
tables in parallel",
+ RelationGetRelationName(onerel))));
+ relation_close(onerel, lmode);
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+ /* It's OK to proceed with ANALYZE on this table */
+ return true;
+ }
+

If we can not support the parallel vacuum for the temporary table then
shouldn't we fall back to the normal vacuum instead of skipping the
table.  I think it's not fair that if the user has given system-wide
parallel vacuum then all the temp table will be skipped and not at all
vacuumed then user need to again perform normal vacuum on those
tables.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Wed, Nov 13, 2019 at 9:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Yeah, 0,2,3 and 4 sounds reasonable to me.  Earlier, Dilip also got
> confused with option 1.
>

Let me try to summarize the discussion on this point and see if others
have any opinion on this matter.

We need a way to allow IndexAm to specify whether it can participate
in a parallel vacuum.  As we know there are two phases of
index-vacuum, bulkdelete and vacuumcleanup and in many cases, the
bulkdelete performs the main deletion work and then vacuumcleanup just
returns index statistics. So, for such cases, we don't want the second
phase to be performed by a parallel vacuum worker.  Now, if the
bulkdelete phase is not performed, then vacuumcleanup can process the
entire index in which case it is better to do that phase via parallel
worker.

OTOH, in some cases vacuumcleanup takes another pass over-index to
reclaim empty pages and update record the same in FSM even if
bulkdelete is performed.  This happens in gin and bloom indexes.
Then, we have an index where we do all the work in cleanup phase like
in the case of brin indexes.  Now, for this category of indexes, we
want vacuumcleanup phase to be also performed by a parallel worker.

In short different indexes have different requirements for which phase
of index vacuum can be performed in parallel.  Just to be clear, we
can't perform both the phases (bulkdelete and cleanup) in one-go as
bulk-delete can happen multiple times on a large index whereas
vacuumcleanup is done once at the end.

Based on these needs, we came up with a way to allow users to specify
this information for IndexAm's. Basically, Indexam will expose a
variable amparallelvacuumoptions which can have below options

VACUUM_OPTION_NO_PARALLEL   1 << 0 # vacuum (neither bulkdelete nor
vacuumcleanup) can't be performed in parallel
VACUUM_OPTION_PARALLEL_BULKDEL   1 << 1 # bulkdelete can be done in
parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
flag)
VACUUM_OPTION_PARALLEL_COND_CLEANUP  1 << 2 # vacuumcleanup can be
done in parallel if bulkdelete is not performed (Indexes nbtree, brin,
gin, gist,
spgist, bloom will set this flag)
VACUUM_OPTION_PARALLEL_CLEANUP  1 << 3 # vacuumcleanup can be done in
parallel even if bulkdelete is already performed (Indexes gin, brin,
and bloom will set this flag)

We have discussed to expose this information via two variables but the
above seems like a better idea to all the people involved.

Any suggestions?  Anyone thinks this is not the right way to expose
this information or there is no need to expose this information or
they have a better idea for this?

Sawada-San, Dilip, feel free to correct me.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Wed, 13 Nov 2019 at 12:43, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Nov 12, 2019 at 5:31 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Tue, 12 Nov 2019 at 20:29, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > On Mon, 11 Nov 2019 at 17:57, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > > > I realized that v31-0006 patch doesn't work fine so I've attached the
> > > > > > updated version patch that also incorporated some comments I got so
> > > > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > > > > > test the total delay time.
> > > > > >
> > > > > While reviewing the 0002, I got one doubt related to how we are
> > > > > dividing the maintainance_work_mem
> > > > >
> > > > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
> > > > > +{
> > > > > + /* Compute the new maitenance_work_mem value for index vacuuming */
> > > > > + lvshared->maintenance_work_mem_worker =
> > > > > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm :
> > > > > maintenance_work_mem;
> > > > > +}
> > > > > Is it fair to just consider the number of indexes which use
> > > > > maintenance_work_mem?  Or we need to consider the number of worker as
> > > > > well.  My point is suppose there are 10 indexes which will use the
> > > > > maintenance_work_mem but we are launching just 2 workers then what is
> > > > > the point in dividing the maintenance_work_mem by 10.
> > > > >
> > > > > IMHO the calculation should be like this
> > > > > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ?
> > > > > maintenance_work_mem / Min(nindexes_mwm, nworkers)  :
> > > > > maintenance_work_mem;
> > > > >
> > > > > Am I missing something?
> > > >
> > > > No, I think you're right. On the other hand I think that dividing it
> > > > by the number of indexes that will use the mantenance_work_mem makes
> > > > sense when parallel degree > the number of such indexes. Suppose the
> > > > table has 2 indexes and there are 10 workers then we should divide the
> > > > maintenance_work_mem by 2 rather than 10 because it's possible that at
> > > > most 2 indexes that uses the maintenance_work_mem are processed in
> > > > parallel at a time.
> > > >
> > > Right, thats the reason I suggested divide with Min(nindexes_mwm, nworkers).
> >
> > Thanks! I'll fix it in the next version patch.
> >
> One more comment.
>
> +lazy_vacuum_indexes(LVRelStats *vacrelstats, Relation *Irel,
> + int nindexes, IndexBulkDeleteResult **stats,
> + LVParallelState *lps)
> +{
> + ....
>
> + if (ParallelVacuumIsActive(lps))
> + {
>
> +
> + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
> + stats, lps);
> +
> + }
> +
> + for (idx = 0; idx < nindexes; idx++)
> + {
> + /*
> + * Skip indexes that we have already vacuumed during parallel index
> + * vacuuming.
> + */
> + if (ParallelVacuumIsActive(lps) && !IndStatsIsNull(lps->lvshared, idx))
> + continue;
> +
> + lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples,
> +   vacrelstats->old_live_tuples);
> + }
> +}
>
> In this function, if ParallelVacuumIsActive, we perform the parallel
> vacuum for all the index for which parallel vacuum is supported and
> once that is over we finish vacuuming remaining indexes for which
> parallel vacuum is not supported.  But, my question is that inside
> lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers
> to finish their job then only we start with the sequential vacuuming
> shouldn't we start that immediately as soon as the leader
> participation is over in the parallel vacuum?

If we do that, while the leader process is vacuuming indexes that
don't not support parallel vacuum sequentially some workers might be
vacuuming for other indexes. Isn't it a problem? If it's not problem,
I think we can tie up indexes that don't support parallel vacuum to
the leader and do parallel index vacuum.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Tue, Nov 12, 2019 at 3:14 PM Mahendra Singh <mahi6run@gmail.com> wrote:
>
> On Mon, 11 Nov 2019 at 16:36, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Nov 11, 2019 at 2:53 PM Mahendra Singh <mahi6run@gmail.com> wrote:
> > >
> > >
> > > For small indexes also, we gained some performance by parallel vacuum.
> > >
> >
> > Thanks for doing all these tests.  It is clear with this and previous
> > tests that this patch has benefit in wide variety of cases.  However,
> > we should try to see some worst cases as well.  For example, if there
> > are multiple indexes on a table and only one of them is large whereas
> > all others are very small say having a few 100 or 1000 rows.
> >
>
> Thanks Amit for your comments.
>
> I did some testing on the above suggested lines. Below is the summary:
> Test case:(I created 16 indexes but only 1 index is large, other are very small)
> create table test(a int, b int, c int, d int, e int, f int, g int, h int);
> create index i3 on test (a) where a > 2000 and a < 3000;
> create index i4 on test (a) where a > 3000 and a < 4000;
> create index i5 on test (a) where a > 4000 and a < 5000;
> create index i6 on test (a) where a > 5000 and a < 6000;
> create index i7 on test (b) where a < 1000;
> create index i8 on test (c) where a < 1000;
> create index i9 on test (d) where a < 1000;
> create index i10 on test (d) where a < 1000;
> create index i11 on test (d) where a < 1000;
> create index i12 on test (d) where a < 1000;
> create index i13 on test (d) where a < 1000;
> create index i14 on test (d) where a < 1000;
> create index i15 on test (d) where a < 1000;
> create index i16 on test (d) where a < 1000;
> insert into test select i,i,i,i,i,i,i,i from generate_series(1,1000000) as i;
> delete from test where a %2=0;
>
> case 1: vacuum without using parallel workers.
> vacuum test;
> 228.259 ms
>
> case 2: vacuum with 1 parallel worker.
> vacuum (parallel 1) test;
> 251.725 ms
>
> case 3: vacuum with 3 parallel workers.
> vacuum (parallel 3) test;
> 259.986
>
> From above results, it seems that if indexes are small, then parallel vacuum is not beneficial as compared to normal
vacuum.
>

Right and that is what is expected as well.  However, I think if
somehow disallow very small indexes to use parallel worker, then it
will be better.   Can we use  min_parallel_index_scan_size to decide
whether a particular index can participate in a parallel vacuum?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Wed, Nov 13, 2019 at 11:01 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Nov 13, 2019 at 9:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Yeah, 0,2,3 and 4 sounds reasonable to me.  Earlier, Dilip also got
> > confused with option 1.
> >
>
> Let me try to summarize the discussion on this point and see if others
> have any opinion on this matter.
>
> We need a way to allow IndexAm to specify whether it can participate
> in a parallel vacuum.  As we know there are two phases of
> index-vacuum, bulkdelete and vacuumcleanup and in many cases, the
> bulkdelete performs the main deletion work and then vacuumcleanup just
> returns index statistics. So, for such cases, we don't want the second
> phase to be performed by a parallel vacuum worker.  Now, if the
> bulkdelete phase is not performed, then vacuumcleanup can process the
> entire index in which case it is better to do that phase via parallel
> worker.
>
> OTOH, in some cases vacuumcleanup takes another pass over-index to
> reclaim empty pages and update record the same in FSM even if
> bulkdelete is performed.  This happens in gin and bloom indexes.
> Then, we have an index where we do all the work in cleanup phase like
> in the case of brin indexes.  Now, for this category of indexes, we
> want vacuumcleanup phase to be also performed by a parallel worker.
>
> In short different indexes have different requirements for which phase
> of index vacuum can be performed in parallel.  Just to be clear, we
> can't perform both the phases (bulkdelete and cleanup) in one-go as
> bulk-delete can happen multiple times on a large index whereas
> vacuumcleanup is done once at the end.
>
> Based on these needs, we came up with a way to allow users to specify
> this information for IndexAm's. Basically, Indexam will expose a
> variable amparallelvacuumoptions which can have below options
>
> VACUUM_OPTION_NO_PARALLEL   1 << 0 # vacuum (neither bulkdelete nor
> vacuumcleanup) can't be performed in parallel
> VACUUM_OPTION_PARALLEL_BULKDEL   1 << 1 # bulkdelete can be done in
> parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> flag)
> VACUUM_OPTION_PARALLEL_COND_CLEANUP  1 << 2 # vacuumcleanup can be
> done in parallel if bulkdelete is not performed (Indexes nbtree, brin,
> gin, gist,
> spgist, bloom will set this flag)
> VACUUM_OPTION_PARALLEL_CLEANUP  1 << 3 # vacuumcleanup can be done in
> parallel even if bulkdelete is already performed (Indexes gin, brin,
> and bloom will set this flag)
>
> We have discussed to expose this information via two variables but the
> above seems like a better idea to all the people involved.
>
> Any suggestions?  Anyone thinks this is not the right way to expose
> this information or there is no need to expose this information or
> they have a better idea for this?
>
> Sawada-San, Dilip, feel free to correct me.
Looks fine to me.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Wed, Nov 13, 2019 at 11:39 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Wed, 13 Nov 2019 at 12:43, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> >
> > In this function, if ParallelVacuumIsActive, we perform the parallel
> > vacuum for all the index for which parallel vacuum is supported and
> > once that is over we finish vacuuming remaining indexes for which
> > parallel vacuum is not supported.  But, my question is that inside
> > lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers
> > to finish their job then only we start with the sequential vacuuming
> > shouldn't we start that immediately as soon as the leader
> > participation is over in the parallel vacuum?
>
> If we do that, while the leader process is vacuuming indexes that
> don't not support parallel vacuum sequentially some workers might be
> vacuuming for other indexes. Isn't it a problem?
>

Can you please explain what problem do you see with that?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Wed, Nov 13, 2019 at 11:39 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Wed, 13 Nov 2019 at 12:43, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Tue, Nov 12, 2019 at 5:31 PM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Tue, 12 Nov 2019 at 20:29, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada
> > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > > >
> > > > > On Mon, 11 Nov 2019 at 17:57, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > >
> > > > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > > > > I realized that v31-0006 patch doesn't work fine so I've attached the
> > > > > > > updated version patch that also incorporated some comments I got so
> > > > > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > > > > > > test the total delay time.
> > > > > > >
> > > > > > While reviewing the 0002, I got one doubt related to how we are
> > > > > > dividing the maintainance_work_mem
> > > > > >
> > > > > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
> > > > > > +{
> > > > > > + /* Compute the new maitenance_work_mem value for index vacuuming */
> > > > > > + lvshared->maintenance_work_mem_worker =
> > > > > > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm :
> > > > > > maintenance_work_mem;
> > > > > > +}
> > > > > > Is it fair to just consider the number of indexes which use
> > > > > > maintenance_work_mem?  Or we need to consider the number of worker as
> > > > > > well.  My point is suppose there are 10 indexes which will use the
> > > > > > maintenance_work_mem but we are launching just 2 workers then what is
> > > > > > the point in dividing the maintenance_work_mem by 10.
> > > > > >
> > > > > > IMHO the calculation should be like this
> > > > > > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ?
> > > > > > maintenance_work_mem / Min(nindexes_mwm, nworkers)  :
> > > > > > maintenance_work_mem;
> > > > > >
> > > > > > Am I missing something?
> > > > >
> > > > > No, I think you're right. On the other hand I think that dividing it
> > > > > by the number of indexes that will use the mantenance_work_mem makes
> > > > > sense when parallel degree > the number of such indexes. Suppose the
> > > > > table has 2 indexes and there are 10 workers then we should divide the
> > > > > maintenance_work_mem by 2 rather than 10 because it's possible that at
> > > > > most 2 indexes that uses the maintenance_work_mem are processed in
> > > > > parallel at a time.
> > > > >
> > > > Right, thats the reason I suggested divide with Min(nindexes_mwm, nworkers).
> > >
> > > Thanks! I'll fix it in the next version patch.
> > >
> > One more comment.
> >
> > +lazy_vacuum_indexes(LVRelStats *vacrelstats, Relation *Irel,
> > + int nindexes, IndexBulkDeleteResult **stats,
> > + LVParallelState *lps)
> > +{
> > + ....
> >
> > + if (ParallelVacuumIsActive(lps))
> > + {
> >
> > +
> > + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
> > + stats, lps);
> > +
> > + }
> > +
> > + for (idx = 0; idx < nindexes; idx++)
> > + {
> > + /*
> > + * Skip indexes that we have already vacuumed during parallel index
> > + * vacuuming.
> > + */
> > + if (ParallelVacuumIsActive(lps) && !IndStatsIsNull(lps->lvshared, idx))
> > + continue;
> > +
> > + lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples,
> > +   vacrelstats->old_live_tuples);
> > + }
> > +}
> >
> > In this function, if ParallelVacuumIsActive, we perform the parallel
> > vacuum for all the index for which parallel vacuum is supported and
> > once that is over we finish vacuuming remaining indexes for which
> > parallel vacuum is not supported.  But, my question is that inside
> > lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers
> > to finish their job then only we start with the sequential vacuuming
> > shouldn't we start that immediately as soon as the leader
> > participation is over in the parallel vacuum?
>
> If we do that, while the leader process is vacuuming indexes that
> don't not support parallel vacuum sequentially some workers might be
> vacuuming for other indexes. Isn't it a problem?

I am not sure what could be the problem.

 If it's not problem,
> I think we can tie up indexes that don't support parallel vacuum to
> the leader and do parallel index vacuum.

I am not sure whether we can do that or not.  Because if we do a
parallel vacuum from the leader for the indexes which don't support a
parallel option then we will unnecessarily allocate the shared memory
for those indexes (index stats).  Moreover, I think it could also
cause a problem in a multi-pass vacuum if we try to copy its stats
into the shared memory.

I think simple option would be that as soon as leader participation is
over we can have a loop for all the indexes who don't support
parallelism in that phase and after completing that we wait for the
parallel workers to finish.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Wed, 13 Nov 2019 at 17:57, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Nov 13, 2019 at 11:39 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Wed, 13 Nov 2019 at 12:43, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > >
> > > In this function, if ParallelVacuumIsActive, we perform the parallel
> > > vacuum for all the index for which parallel vacuum is supported and
> > > once that is over we finish vacuuming remaining indexes for which
> > > parallel vacuum is not supported.  But, my question is that inside
> > > lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers
> > > to finish their job then only we start with the sequential vacuuming
> > > shouldn't we start that immediately as soon as the leader
> > > participation is over in the parallel vacuum?
> >
> > If we do that, while the leader process is vacuuming indexes that
> > don't not support parallel vacuum sequentially some workers might be
> > vacuuming for other indexes. Isn't it a problem?
> >
>
> Can you please explain what problem do you see with that?

I think it depends on index AM user expectation. If disabling parallel
vacuum for an index means that index AM user doesn't just want to
vacuum the index by parallel worker, it's not problem. But if it means
that the user doesn't want to vacuum the index during other indexes is
 being processed in parallel it's unexpected behaviour for the user.
I'm probably worrying too much.

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Wed, Nov 13, 2019 at 3:55 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Wed, 13 Nov 2019 at 17:57, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Nov 13, 2019 at 11:39 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Wed, 13 Nov 2019 at 12:43, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > >
> > > > In this function, if ParallelVacuumIsActive, we perform the parallel
> > > > vacuum for all the index for which parallel vacuum is supported and
> > > > once that is over we finish vacuuming remaining indexes for which
> > > > parallel vacuum is not supported.  But, my question is that inside
> > > > lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers
> > > > to finish their job then only we start with the sequential vacuuming
> > > > shouldn't we start that immediately as soon as the leader
> > > > participation is over in the parallel vacuum?
> > >
> > > If we do that, while the leader process is vacuuming indexes that
> > > don't not support parallel vacuum sequentially some workers might be
> > > vacuuming for other indexes. Isn't it a problem?
> > >
> >
> > Can you please explain what problem do you see with that?
>
> I think it depends on index AM user expectation. If disabling parallel
> vacuum for an index means that index AM user doesn't just want to
> vacuum the index by parallel worker, it's not problem. But if it means
> that the user doesn't want to vacuum the index during other indexes is
>  being processed in parallel it's unexpected behaviour for the user.
>

I would expect the earlier.

> I'm probably worrying too much.
>

Yeah, we can keep the behavior with respect to your first expectation
(If disabling parallel vacuum for an index means that index AM user
doesn't just want to vacuum the index by parallel worker, it's not
problem).  It might not be difficult to change later if there is an
example of such a case.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Wed, 13 Nov 2019 at 18:49, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Nov 13, 2019 at 11:39 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Wed, 13 Nov 2019 at 12:43, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Tue, Nov 12, 2019 at 5:31 PM Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > On Tue, 12 Nov 2019 at 20:29, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada
> > > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > > > >
> > > > > > On Mon, 11 Nov 2019 at 17:57, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > > >
> > > > > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > > > > > I realized that v31-0006 patch doesn't work fine so I've attached the
> > > > > > > > updated version patch that also incorporated some comments I got so
> > > > > > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > > > > > > > test the total delay time.
> > > > > > > >
> > > > > > > While reviewing the 0002, I got one doubt related to how we are
> > > > > > > dividing the maintainance_work_mem
> > > > > > >
> > > > > > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
> > > > > > > +{
> > > > > > > + /* Compute the new maitenance_work_mem value for index vacuuming */
> > > > > > > + lvshared->maintenance_work_mem_worker =
> > > > > > > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm :
> > > > > > > maintenance_work_mem;
> > > > > > > +}
> > > > > > > Is it fair to just consider the number of indexes which use
> > > > > > > maintenance_work_mem?  Or we need to consider the number of worker as
> > > > > > > well.  My point is suppose there are 10 indexes which will use the
> > > > > > > maintenance_work_mem but we are launching just 2 workers then what is
> > > > > > > the point in dividing the maintenance_work_mem by 10.
> > > > > > >
> > > > > > > IMHO the calculation should be like this
> > > > > > > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ?
> > > > > > > maintenance_work_mem / Min(nindexes_mwm, nworkers)  :
> > > > > > > maintenance_work_mem;
> > > > > > >
> > > > > > > Am I missing something?
> > > > > >
> > > > > > No, I think you're right. On the other hand I think that dividing it
> > > > > > by the number of indexes that will use the mantenance_work_mem makes
> > > > > > sense when parallel degree > the number of such indexes. Suppose the
> > > > > > table has 2 indexes and there are 10 workers then we should divide the
> > > > > > maintenance_work_mem by 2 rather than 10 because it's possible that at
> > > > > > most 2 indexes that uses the maintenance_work_mem are processed in
> > > > > > parallel at a time.
> > > > > >
> > > > > Right, thats the reason I suggested divide with Min(nindexes_mwm, nworkers).
> > > >
> > > > Thanks! I'll fix it in the next version patch.
> > > >
> > > One more comment.
> > >
> > > +lazy_vacuum_indexes(LVRelStats *vacrelstats, Relation *Irel,
> > > + int nindexes, IndexBulkDeleteResult **stats,
> > > + LVParallelState *lps)
> > > +{
> > > + ....
> > >
> > > + if (ParallelVacuumIsActive(lps))
> > > + {
> > >
> > > +
> > > + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
> > > + stats, lps);
> > > +
> > > + }
> > > +
> > > + for (idx = 0; idx < nindexes; idx++)
> > > + {
> > > + /*
> > > + * Skip indexes that we have already vacuumed during parallel index
> > > + * vacuuming.
> > > + */
> > > + if (ParallelVacuumIsActive(lps) && !IndStatsIsNull(lps->lvshared, idx))
> > > + continue;
> > > +
> > > + lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples,
> > > +   vacrelstats->old_live_tuples);
> > > + }
> > > +}
> > >
> > > In this function, if ParallelVacuumIsActive, we perform the parallel
> > > vacuum for all the index for which parallel vacuum is supported and
> > > once that is over we finish vacuuming remaining indexes for which
> > > parallel vacuum is not supported.  But, my question is that inside
> > > lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers
> > > to finish their job then only we start with the sequential vacuuming
> > > shouldn't we start that immediately as soon as the leader
> > > participation is over in the parallel vacuum?
> >
> > If we do that, while the leader process is vacuuming indexes that
> > don't not support parallel vacuum sequentially some workers might be
> > vacuuming for other indexes. Isn't it a problem?
>
> I am not sure what could be the problem.
>
>  If it's not problem,
> > I think we can tie up indexes that don't support parallel vacuum to
> > the leader and do parallel index vacuum.
>
> I am not sure whether we can do that or not.  Because if we do a
> parallel vacuum from the leader for the indexes which don't support a
> parallel option then we will unnecessarily allocate the shared memory
> for those indexes (index stats).  Moreover, I think it could also
> cause a problem in a multi-pass vacuum if we try to copy its stats
> into the shared memory.
>
> I think simple option would be that as soon as leader participation is
> over we can have a loop for all the indexes who don't support
> parallelism in that phase and after completing that we wait for the
> parallel workers to finish.

Hmm I thought we don't allocate DSM for indexes which don't support
both parallel bulk deletion and parallel cleanup and we can always
assign indexes to the leader process if they don't  support particular
phase during parallel index vacuuming. But your suggestion sounds more
simple. I'll incorporate your suggestion in the next version patch.
Thanks!

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Wed, Nov 13, 2019 at 9:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> + /*
> + * Since parallel workers cannot access data in temporary tables, parallel
> + * vacuum is not allowed for temporary relation.
> + */
> + if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0)
> + {
> + ereport(WARNING,
> + (errmsg("skipping vacuum on \"%s\" --- cannot vacuum temporary
> tables in parallel",
> + RelationGetRelationName(onerel))));
> + relation_close(onerel, lmode);
> + PopActiveSnapshot();
> + CommitTransactionCommand();
> + /* It's OK to proceed with ANALYZE on this table */
> + return true;
> + }
> +
>
> If we can not support the parallel vacuum for the temporary table then
> shouldn't we fall back to the normal vacuum instead of skipping the
> table.  I think it's not fair that if the user has given system-wide
> parallel vacuum then all the temp table will be skipped and not at all
> vacuumed then user need to again perform normal vacuum on those
> tables.
>

Good point.  However, I think the current coding also makes sense for
cases like "Vacuum (analyze, parallel 2) tmp_tab;".  In such a case,
it will skip the vacuum part of it but will perform analyze.  Having
said that, I can see the merit of your point and I also vote to follow
your suggestion and add a note to the document unless it makes code
look ugly.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Wed, 13 Nov 2019 at 14:31, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Nov 13, 2019 at 9:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Yeah, 0,2,3 and 4 sounds reasonable to me.  Earlier, Dilip also got
> > confused with option 1.
> >
>
> Let me try to summarize the discussion on this point and see if others
> have any opinion on this matter.

Thank you for summarizing.

>
> We need a way to allow IndexAm to specify whether it can participate
> in a parallel vacuum.  As we know there are two phases of
> index-vacuum, bulkdelete and vacuumcleanup and in many cases, the
> bulkdelete performs the main deletion work and then vacuumcleanup just
> returns index statistics. So, for such cases, we don't want the second
> phase to be performed by a parallel vacuum worker.  Now, if the
> bulkdelete phase is not performed, then vacuumcleanup can process the
> entire index in which case it is better to do that phase via parallel
> worker.
>
> OTOH, in some cases vacuumcleanup takes another pass over-index to
> reclaim empty pages and update record the same in FSM even if
> bulkdelete is performed.  This happens in gin and bloom indexes.
> Then, we have an index where we do all the work in cleanup phase like
> in the case of brin indexes.  Now, for this category of indexes, we
> want vacuumcleanup phase to be also performed by a parallel worker.
>
> In short different indexes have different requirements for which phase
> of index vacuum can be performed in parallel.  Just to be clear, we
> can't perform both the phases (bulkdelete and cleanup) in one-go as
> bulk-delete can happen multiple times on a large index whereas
> vacuumcleanup is done once at the end.
>
> Based on these needs, we came up with a way to allow users to specify
> this information for IndexAm's. Basically, Indexam will expose a
> variable amparallelvacuumoptions which can have below options
>
> VACUUM_OPTION_NO_PARALLEL   1 << 0 # vacuum (neither bulkdelete nor
> vacuumcleanup) can't be performed in parallel

I think VACUUM_OPTION_NO_PARALLEL can be 0 so that index AMs who don't
want to support parallel vacuum don't have to set anything.

> VACUUM_OPTION_PARALLEL_BULKDEL   1 << 1 # bulkdelete can be done in
> parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> flag)
> VACUUM_OPTION_PARALLEL_COND_CLEANUP  1 << 2 # vacuumcleanup can be
> done in parallel if bulkdelete is not performed (Indexes nbtree, brin,
> gin, gist,
> spgist, bloom will set this flag)
> VACUUM_OPTION_PARALLEL_CLEANUP  1 << 3 # vacuumcleanup can be done in
> parallel even if bulkdelete is already performed (Indexes gin, brin,
> and bloom will set this flag)

I think gin and bloom don't need to set both but should set only
VACUUM_OPTION_PARALLEL_CLEANUP.

And I'm going to disallow index AMs to set both
VACUUM_OPTION_PARALLEL_COND_CLEANUP and VACUUM_OPTION_PARALLEL_CLEANUP
by assertions, is that okay?

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Mon, Nov 18, 2019 at 11:37 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Wed, 13 Nov 2019 at 14:31, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > Based on these needs, we came up with a way to allow users to specify
> > this information for IndexAm's. Basically, Indexam will expose a
> > variable amparallelvacuumoptions which can have below options
> >
> > VACUUM_OPTION_NO_PARALLEL   1 << 0 # vacuum (neither bulkdelete nor
> > vacuumcleanup) can't be performed in parallel
>
> I think VACUUM_OPTION_NO_PARALLEL can be 0 so that index AMs who don't
> want to support parallel vacuum don't have to set anything.
>

make sense.

> > VACUUM_OPTION_PARALLEL_BULKDEL   1 << 1 # bulkdelete can be done in
> > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> > flag)
> > VACUUM_OPTION_PARALLEL_COND_CLEANUP  1 << 2 # vacuumcleanup can be
> > done in parallel if bulkdelete is not performed (Indexes nbtree, brin,
> > gin, gist,
> > spgist, bloom will set this flag)
> > VACUUM_OPTION_PARALLEL_CLEANUP  1 << 3 # vacuumcleanup can be done in
> > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > and bloom will set this flag)
>
> I think gin and bloom don't need to set both but should set only
> VACUUM_OPTION_PARALLEL_CLEANUP.
>
> And I'm going to disallow index AMs to set both
> VACUUM_OPTION_PARALLEL_COND_CLEANUP and VACUUM_OPTION_PARALLEL_CLEANUP
> by assertions, is that okay?
>

Sounds reasonable to me.

Are you planning to include the changes related to I/O throttling
based on the discussion in the nearby thread [1]?  I think you can do
that if you agree with the conclusion in the last email[1], otherwise,
we can explore it separately.

[1] - https://www.postgresql.org/message-id/CAA4eK1%2BuDgLwfnAhQWGpAe66D85PdkeBygZGVyX96%2BovN1PbOg%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Mon, 18 Nov 2019 at 15:34, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Nov 18, 2019 at 11:37 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Wed, 13 Nov 2019 at 14:31, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > >
> > > Based on these needs, we came up with a way to allow users to specify
> > > this information for IndexAm's. Basically, Indexam will expose a
> > > variable amparallelvacuumoptions which can have below options
> > >
> > > VACUUM_OPTION_NO_PARALLEL   1 << 0 # vacuum (neither bulkdelete nor
> > > vacuumcleanup) can't be performed in parallel
> >
> > I think VACUUM_OPTION_NO_PARALLEL can be 0 so that index AMs who don't
> > want to support parallel vacuum don't have to set anything.
> >
>
> make sense.
>
> > > VACUUM_OPTION_PARALLEL_BULKDEL   1 << 1 # bulkdelete can be done in
> > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> > > flag)
> > > VACUUM_OPTION_PARALLEL_COND_CLEANUP  1 << 2 # vacuumcleanup can be
> > > done in parallel if bulkdelete is not performed (Indexes nbtree, brin,
> > > gin, gist,
> > > spgist, bloom will set this flag)
> > > VACUUM_OPTION_PARALLEL_CLEANUP  1 << 3 # vacuumcleanup can be done in
> > > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > > and bloom will set this flag)
> >
> > I think gin and bloom don't need to set both but should set only
> > VACUUM_OPTION_PARALLEL_CLEANUP.
> >
> > And I'm going to disallow index AMs to set both
> > VACUUM_OPTION_PARALLEL_COND_CLEANUP and VACUUM_OPTION_PARALLEL_CLEANUP
> > by assertions, is that okay?
> >
>
> Sounds reasonable to me.
>
> Are you planning to include the changes related to I/O throttling
> based on the discussion in the nearby thread [1]?  I think you can do
> that if you agree with the conclusion in the last email[1], otherwise,
> we can explore it separately.

Yes I agreed. I'm going to include that changes in the next version
patches. And I think we will be able to do more discussion based on
the patch.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Mon, 18 Nov 2019 at 15:38, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Mon, 18 Nov 2019 at 15:34, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Nov 18, 2019 at 11:37 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Wed, 13 Nov 2019 at 14:31, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > >
> > > > Based on these needs, we came up with a way to allow users to specify
> > > > this information for IndexAm's. Basically, Indexam will expose a
> > > > variable amparallelvacuumoptions which can have below options
> > > >
> > > > VACUUM_OPTION_NO_PARALLEL   1 << 0 # vacuum (neither bulkdelete nor
> > > > vacuumcleanup) can't be performed in parallel
> > >
> > > I think VACUUM_OPTION_NO_PARALLEL can be 0 so that index AMs who don't
> > > want to support parallel vacuum don't have to set anything.
> > >
> >
> > make sense.
> >
> > > > VACUUM_OPTION_PARALLEL_BULKDEL   1 << 1 # bulkdelete can be done in
> > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> > > > flag)
> > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP  1 << 2 # vacuumcleanup can be
> > > > done in parallel if bulkdelete is not performed (Indexes nbtree, brin,
> > > > gin, gist,
> > > > spgist, bloom will set this flag)
> > > > VACUUM_OPTION_PARALLEL_CLEANUP  1 << 3 # vacuumcleanup can be done in
> > > > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > > > and bloom will set this flag)
> > >
> > > I think gin and bloom don't need to set both but should set only
> > > VACUUM_OPTION_PARALLEL_CLEANUP.
> > >
> > > And I'm going to disallow index AMs to set both
> > > VACUUM_OPTION_PARALLEL_COND_CLEANUP and VACUUM_OPTION_PARALLEL_CLEANUP
> > > by assertions, is that okay?
> > >
> >
> > Sounds reasonable to me.
> >
> > Are you planning to include the changes related to I/O throttling
> > based on the discussion in the nearby thread [1]?  I think you can do
> > that if you agree with the conclusion in the last email[1], otherwise,
> > we can explore it separately.
>
> Yes I agreed. I'm going to include that changes in the next version
> patches. And I think we will be able to do more discussion based on
> the patch.
>

I've attached the latest version patch set. The patch set includes all
discussed points regarding index AM options as well as shared cost
balance. Also I added some test cases used all types of index AM.

During developments I had one concern about the number of parallel
workers to launch. In current design each index AMs can choose the
participation of parallel bulk-deletion and parallel cleanup. That
also means the number of parallel worker to launch might be different
for each time of parallel bulk-deletion and parallel cleanup. In
current patch the leader will always launch the number of indexes that
support either one but it would not be efficient in some cases. For
example, if we have 3 indexes supporting only parallel bulk-deletion
and 2 indexes supporting only parallel index cleanup, we would launch
5 workers for each execution but some workers will do nothing at all.
To deal with this problem, I wonder if we can improve the parallel
query so that the leader process creates a parallel context with the
maximum number of indexes and can launch a part of workers instead of
all of them.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> I've attached the latest version patch set. The patch set includes all
> discussed points regarding index AM options as well as shared cost
> balance. Also I added some test cases used all types of index AM.
>
> During developments I had one concern about the number of parallel
> workers to launch. In current design each index AMs can choose the
> participation of parallel bulk-deletion and parallel cleanup. That
> also means the number of parallel worker to launch might be different
> for each time of parallel bulk-deletion and parallel cleanup. In
> current patch the leader will always launch the number of indexes that
> support either one but it would not be efficient in some cases. For
> example, if we have 3 indexes supporting only parallel bulk-deletion
> and 2 indexes supporting only parallel index cleanup, we would launch
> 5 workers for each execution but some workers will do nothing at all.
> To deal with this problem, I wonder if we can improve the parallel
> query so that the leader process creates a parallel context with the
> maximum number of indexes and can launch a part of workers instead of
> all of them.
>

Can't we choose the number of workers as a maximum of
"num_of_indexes_that_support_bulk_del" and
"num_of_indexes_that_support_cleanup"?  If we can do that, then we can
always launch the required number of workers for each phase (bulk_del,
cleanup).  In your above example, it should choose 3 workers while
creating a parallel context.  Do you see any problem with that?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Wed, 20 Nov 2019 at 17:02, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > I've attached the latest version patch set. The patch set includes all
> > discussed points regarding index AM options as well as shared cost
> > balance. Also I added some test cases used all types of index AM.
> >
> > During developments I had one concern about the number of parallel
> > workers to launch. In current design each index AMs can choose the
> > participation of parallel bulk-deletion and parallel cleanup. That
> > also means the number of parallel worker to launch might be different
> > for each time of parallel bulk-deletion and parallel cleanup. In
> > current patch the leader will always launch the number of indexes that
> > support either one but it would not be efficient in some cases. For
> > example, if we have 3 indexes supporting only parallel bulk-deletion
> > and 2 indexes supporting only parallel index cleanup, we would launch
> > 5 workers for each execution but some workers will do nothing at all.
> > To deal with this problem, I wonder if we can improve the parallel
> > query so that the leader process creates a parallel context with the
> > maximum number of indexes and can launch a part of workers instead of
> > all of them.
> >
>
> Can't we choose the number of workers as a maximum of
> "num_of_indexes_that_support_bulk_del" and
> "num_of_indexes_that_support_cleanup"?  If we can do that, then we can
> always launch the required number of workers for each phase (bulk_del,
> cleanup).  In your above example, it should choose 3 workers while
> creating a parallel context.  Do you see any problem with that?

I might be missing something but if we create the parallel context
with 3 workers the leader process always launches 3 workers. Therefore
in the above case it launches 3 workers even in cleanup although 2
workers is enough.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Wed, Nov 20, 2019 at 4:04 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Wed, 20 Nov 2019 at 17:02, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > I've attached the latest version patch set. The patch set includes all
> > > discussed points regarding index AM options as well as shared cost
> > > balance. Also I added some test cases used all types of index AM.
> > >
> > > During developments I had one concern about the number of parallel
> > > workers to launch. In current design each index AMs can choose the
> > > participation of parallel bulk-deletion and parallel cleanup. That
> > > also means the number of parallel worker to launch might be different
> > > for each time of parallel bulk-deletion and parallel cleanup. In
> > > current patch the leader will always launch the number of indexes that
> > > support either one but it would not be efficient in some cases. For
> > > example, if we have 3 indexes supporting only parallel bulk-deletion
> > > and 2 indexes supporting only parallel index cleanup, we would launch
> > > 5 workers for each execution but some workers will do nothing at all.
> > > To deal with this problem, I wonder if we can improve the parallel
> > > query so that the leader process creates a parallel context with the
> > > maximum number of indexes and can launch a part of workers instead of
> > > all of them.
> > >
> >
> > Can't we choose the number of workers as a maximum of
> > "num_of_indexes_that_support_bulk_del" and
> > "num_of_indexes_that_support_cleanup"?  If we can do that, then we can
> > always launch the required number of workers for each phase (bulk_del,
> > cleanup).  In your above example, it should choose 3 workers while
> > creating a parallel context.  Do you see any problem with that?
>
> I might be missing something but if we create the parallel context
> with 3 workers the leader process always launches 3 workers. Therefore
> in the above case it launches 3 workers even in cleanup although 2
> workers is enough.
>

Right, so we can either extend parallel API to launch fewer workers
than it has in parallel context as suggested by you or we can use
separate parallel context for each phase.  Going with the earlier has
the benefit that we don't need to recreate the parallel context and
the latter has the advantage that we won't keep additional shared
memory allocated.  BTW, what kind of API change you have in mind for
the approach you are suggesting?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Wed, 20 Nov 2019 at 20:36, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Nov 20, 2019 at 4:04 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Wed, 20 Nov 2019 at 17:02, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > I've attached the latest version patch set. The patch set includes all
> > > > discussed points regarding index AM options as well as shared cost
> > > > balance. Also I added some test cases used all types of index AM.
> > > >
> > > > During developments I had one concern about the number of parallel
> > > > workers to launch. In current design each index AMs can choose the
> > > > participation of parallel bulk-deletion and parallel cleanup. That
> > > > also means the number of parallel worker to launch might be different
> > > > for each time of parallel bulk-deletion and parallel cleanup. In
> > > > current patch the leader will always launch the number of indexes that
> > > > support either one but it would not be efficient in some cases. For
> > > > example, if we have 3 indexes supporting only parallel bulk-deletion
> > > > and 2 indexes supporting only parallel index cleanup, we would launch
> > > > 5 workers for each execution but some workers will do nothing at all.
> > > > To deal with this problem, I wonder if we can improve the parallel
> > > > query so that the leader process creates a parallel context with the
> > > > maximum number of indexes and can launch a part of workers instead of
> > > > all of them.
> > > >
> > >
> > > Can't we choose the number of workers as a maximum of
> > > "num_of_indexes_that_support_bulk_del" and
> > > "num_of_indexes_that_support_cleanup"?  If we can do that, then we can
> > > always launch the required number of workers for each phase (bulk_del,
> > > cleanup).  In your above example, it should choose 3 workers while
> > > creating a parallel context.  Do you see any problem with that?
> >
> > I might be missing something but if we create the parallel context
> > with 3 workers the leader process always launches 3 workers. Therefore
> > in the above case it launches 3 workers even in cleanup although 2
> > workers is enough.
> >
>
> Right, so we can either extend parallel API to launch fewer workers
> than it has in parallel context as suggested by you or we can use
> separate parallel context for each phase.  Going with the earlier has
> the benefit that we don't need to recreate the parallel context and
> the latter has the advantage that we won't keep additional shared
> memory allocated.

I also thought to use separate parallel contexts for each phase but
can the same DSM be used by parallel workers  who initiated from
different parallel contexts? If not I think that doesn't work because
the parallel vacuum needs to set data to DSM of ambulkdelete and then
parallel workers for amvacuumcleanup needs to access it.

>  BTW, what kind of API change you have in mind for
> the approach you are suggesting?

I was thinking to add a new API, say LaunchParallelNWorkers(pcxt, n),
where n is the number of workers the caller wants to launch and should
be lower than the value in the parallel context.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Thu, Nov 21, 2019 at 6:53 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Wed, 20 Nov 2019 at 20:36, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Nov 20, 2019 at 4:04 PM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Wed, 20 Nov 2019 at 17:02, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada
> > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > > >
> > > > > I've attached the latest version patch set. The patch set includes all
> > > > > discussed points regarding index AM options as well as shared cost
> > > > > balance. Also I added some test cases used all types of index AM.
> > > > >
> > > > > During developments I had one concern about the number of parallel
> > > > > workers to launch. In current design each index AMs can choose the
> > > > > participation of parallel bulk-deletion and parallel cleanup. That
> > > > > also means the number of parallel worker to launch might be different
> > > > > for each time of parallel bulk-deletion and parallel cleanup. In
> > > > > current patch the leader will always launch the number of indexes that
> > > > > support either one but it would not be efficient in some cases. For
> > > > > example, if we have 3 indexes supporting only parallel bulk-deletion
> > > > > and 2 indexes supporting only parallel index cleanup, we would launch
> > > > > 5 workers for each execution but some workers will do nothing at all.
> > > > > To deal with this problem, I wonder if we can improve the parallel
> > > > > query so that the leader process creates a parallel context with the
> > > > > maximum number of indexes and can launch a part of workers instead of
> > > > > all of them.
> > > > >
> > > >
> > > > Can't we choose the number of workers as a maximum of
> > > > "num_of_indexes_that_support_bulk_del" and
> > > > "num_of_indexes_that_support_cleanup"?  If we can do that, then we can
> > > > always launch the required number of workers for each phase (bulk_del,
> > > > cleanup).  In your above example, it should choose 3 workers while
> > > > creating a parallel context.  Do you see any problem with that?
> > >
> > > I might be missing something but if we create the parallel context
> > > with 3 workers the leader process always launches 3 workers. Therefore
> > > in the above case it launches 3 workers even in cleanup although 2
> > > workers is enough.
> > >
> >
> > Right, so we can either extend parallel API to launch fewer workers
> > than it has in parallel context as suggested by you or we can use
> > separate parallel context for each phase.  Going with the earlier has
> > the benefit that we don't need to recreate the parallel context and
> > the latter has the advantage that we won't keep additional shared
> > memory allocated.
>
> I also thought to use separate parallel contexts for each phase but
> can the same DSM be used by parallel workers  who initiated from
> different parallel contexts? If not I think that doesn't work because
> the parallel vacuum needs to set data to DSM of ambulkdelete and then
> parallel workers for amvacuumcleanup needs to access it.
>

We can probably copy the stats in local memory instead of pointing it
to dsm after bulk-deletion, but I think that would unnecessary
overhead and doesn't sound like a good idea.

> >  BTW, what kind of API change you have in mind for
> > the approach you are suggesting?
>
> I was thinking to add a new API, say LaunchParallelNWorkers(pcxt, n),
> where n is the number of workers the caller wants to launch and should
> be lower than the value in the parallel context.
>

For that won't you need to duplicate most of the code of
LaunchParallelWorkers or maybe move the entire code in
LaunchParallelNWorkers and then LaunchParallelWorkers can also call
it.  Another idea could be to just extend the existing API
LaunchParallelWorkers to take input parameter as the number of
workers, do you see any problem with that or is there a reason you
prefer to write a new API for this?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Thu, Nov 21, 2019 at 9:15 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Nov 21, 2019 at 6:53 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Wed, 20 Nov 2019 at 20:36, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Wed, Nov 20, 2019 at 4:04 PM Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > On Wed, 20 Nov 2019 at 17:02, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada
> > > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > > > >
> > > > > > I've attached the latest version patch set. The patch set includes all
> > > > > > discussed points regarding index AM options as well as shared cost
> > > > > > balance. Also I added some test cases used all types of index AM.
> > > > > >
> > > > > > During developments I had one concern about the number of parallel
> > > > > > workers to launch. In current design each index AMs can choose the
> > > > > > participation of parallel bulk-deletion and parallel cleanup. That
> > > > > > also means the number of parallel worker to launch might be different
> > > > > > for each time of parallel bulk-deletion and parallel cleanup. In
> > > > > > current patch the leader will always launch the number of indexes that
> > > > > > support either one but it would not be efficient in some cases. For
> > > > > > example, if we have 3 indexes supporting only parallel bulk-deletion
> > > > > > and 2 indexes supporting only parallel index cleanup, we would launch
> > > > > > 5 workers for each execution but some workers will do nothing at all.
> > > > > > To deal with this problem, I wonder if we can improve the parallel
> > > > > > query so that the leader process creates a parallel context with the
> > > > > > maximum number of indexes and can launch a part of workers instead of
> > > > > > all of them.
> > > > > >
> > > > >
> > > > > Can't we choose the number of workers as a maximum of
> > > > > "num_of_indexes_that_support_bulk_del" and
> > > > > "num_of_indexes_that_support_cleanup"?  If we can do that, then we can
> > > > > always launch the required number of workers for each phase (bulk_del,
> > > > > cleanup).  In your above example, it should choose 3 workers while
> > > > > creating a parallel context.  Do you see any problem with that?
> > > >
> > > > I might be missing something but if we create the parallel context
> > > > with 3 workers the leader process always launches 3 workers. Therefore
> > > > in the above case it launches 3 workers even in cleanup although 2
> > > > workers is enough.
> > > >
> > >
> > > Right, so we can either extend parallel API to launch fewer workers
> > > than it has in parallel context as suggested by you or we can use
> > > separate parallel context for each phase.  Going with the earlier has
> > > the benefit that we don't need to recreate the parallel context and
> > > the latter has the advantage that we won't keep additional shared
> > > memory allocated.
> >
> > I also thought to use separate parallel contexts for each phase but
> > can the same DSM be used by parallel workers  who initiated from
> > different parallel contexts? If not I think that doesn't work because
> > the parallel vacuum needs to set data to DSM of ambulkdelete and then
> > parallel workers for amvacuumcleanup needs to access it.
> >
>
> We can probably copy the stats in local memory instead of pointing it
> to dsm after bulk-deletion, but I think that would unnecessary
> overhead and doesn't sound like a good idea.

I agree that it will be unnecessary overhead.

>
> > >  BTW, what kind of API change you have in mind for
> > > the approach you are suggesting?
> >
> > I was thinking to add a new API, say LaunchParallelNWorkers(pcxt, n),
> > where n is the number of workers the caller wants to launch and should
> > be lower than the value in the parallel context.
> >
>
> For that won't you need to duplicate most of the code of
> LaunchParallelWorkers or maybe move the entire code in
> LaunchParallelNWorkers and then LaunchParallelWorkers can also call
> it.  Another idea could be to just extend the existing API
> LaunchParallelWorkers to take input parameter as the number of
> workers, do you see any problem with that or is there a reason you
> prefer to write a new API for this?

I think we can pass an extra parameter to LaunchParallelWorkers
therein we can try to launch min(pcxt->nworkers, n).  Or we can put an
assert (n <= pcxt->nworkers).

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Mon, 18 Nov 2019 at 15:38, Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Mon, 18 Nov 2019 at 15:34, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Mon, Nov 18, 2019 at 11:37 AM Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > On Wed, 13 Nov 2019 at 14:31, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > >
> > > > > Based on these needs, we came up with a way to allow users to specify
> > > > > this information for IndexAm's. Basically, Indexam will expose a
> > > > > variable amparallelvacuumoptions which can have below options
> > > > >
> > > > > VACUUM_OPTION_NO_PARALLEL   1 << 0 # vacuum (neither bulkdelete nor
> > > > > vacuumcleanup) can't be performed in parallel
> > > >
> > > > I think VACUUM_OPTION_NO_PARALLEL can be 0 so that index AMs who don't
> > > > want to support parallel vacuum don't have to set anything.
> > > >
> > >
> > > make sense.
> > >
> > > > > VACUUM_OPTION_PARALLEL_BULKDEL   1 << 1 # bulkdelete can be done in
> > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> > > > > flag)
> > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP  1 << 2 # vacuumcleanup can be
> > > > > done in parallel if bulkdelete is not performed (Indexes nbtree, brin,
> > > > > gin, gist,
> > > > > spgist, bloom will set this flag)
> > > > > VACUUM_OPTION_PARALLEL_CLEANUP  1 << 3 # vacuumcleanup can be done in
> > > > > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > > > > and bloom will set this flag)
> > > >
> > > > I think gin and bloom don't need to set both but should set only
> > > > VACUUM_OPTION_PARALLEL_CLEANUP.
> > > >
> > > > And I'm going to disallow index AMs to set both
> > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP and VACUUM_OPTION_PARALLEL_CLEANUP
> > > > by assertions, is that okay?
> > > >
> > >
> > > Sounds reasonable to me.
> > >
> > > Are you planning to include the changes related to I/O throttling
> > > based on the discussion in the nearby thread [1]?  I think you can do
> > > that if you agree with the conclusion in the last email[1], otherwise,
> > > we can explore it separately.
> >
> > Yes I agreed. I'm going to include that changes in the next version
> > patches. And I think we will be able to do more discussion based on
> > the patch.
> >
>
> I've attached the latest version patch set. The patch set includes all
> discussed points regarding index AM options as well as shared cost
> balance. Also I added some test cases used all types of index AM.
>
> During developments I had one concern about the number of parallel
> workers to launch. In current design each index AMs can choose the
> participation of parallel bulk-deletion and parallel cleanup. That
> also means the number of parallel worker to launch might be different
> for each time of parallel bulk-deletion and parallel cleanup. In
> current patch the leader will always launch the number of indexes that
> support either one but it would not be efficient in some cases. For
> example, if we have 3 indexes supporting only parallel bulk-deletion
> and 2 indexes supporting only parallel index cleanup, we would launch
> 5 workers for each execution but some workers will do nothing at all.
> To deal with this problem, I wonder if we can improve the parallel
> query so that the leader process creates a parallel context with the
> maximum number of indexes and can launch a part of workers instead of
> all of them.
>
+
+ /* compute new balance by adding the local value */
+ shared_balance = pg_atomic_read_u32(VacuumSharedCostBalance);
+ new_balance = shared_balance + VacuumCostBalance;

+ /* also compute the total local balance */
+ local_balance = VacuumCostBalanceLocal + VacuumCostBalance;
+
+ if ((new_balance >= VacuumCostLimit) &&
+ (local_balance > 0.5 * (VacuumCostLimit / nworkers)))
+ {
+ /* compute sleep time based on the local cost balance */
+ msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit;
+ new_balance = shared_balance - VacuumCostBalanceLocal;
+ VacuumCostBalanceLocal = 0;
+ }
+
+ if (pg_atomic_compare_exchange_u32(VacuumSharedCostBalance,
+    &shared_balance,
+    new_balance))
+ {
+ /* Updated successfully, break */
+ break;
+ }
While looking at the shared costing delay part, I have noticed that
while checking the delay condition, we are considering local_balance
which is VacuumCostBalanceLocal + VacuumCostBalance, but while
computing the new balance we only reduce shared balance by
VacuumCostBalanceLocal,  I think it should be reduced with
local_balance?  I see that later we are adding VacuumCostBalance to
the VacuumCostBalanceLocal so we are not loosing accounting for this
balance.  But, I feel it is not right that we compare based on one
value and operate based on other. I think we can immediately set
VacuumCostBalanceLocal += VacuumCostBalance before checking the
condition.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Thu, Nov 21, 2019 at 10:46 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Mon, 18 Nov 2019 at 15:38, Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Mon, 18 Nov 2019 at 15:34, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Mon, Nov 18, 2019 at 11:37 AM Masahiko Sawada
> > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > > >
> > > > > On Wed, 13 Nov 2019 at 14:31, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > >
> > > > > > Based on these needs, we came up with a way to allow users to specify
> > > > > > this information for IndexAm's. Basically, Indexam will expose a
> > > > > > variable amparallelvacuumoptions which can have below options
> > > > > >
> > > > > > VACUUM_OPTION_NO_PARALLEL   1 << 0 # vacuum (neither bulkdelete nor
> > > > > > vacuumcleanup) can't be performed in parallel
> > > > >
> > > > > I think VACUUM_OPTION_NO_PARALLEL can be 0 so that index AMs who don't
> > > > > want to support parallel vacuum don't have to set anything.
> > > > >
> > > >
> > > > make sense.
> > > >
> > > > > > VACUUM_OPTION_PARALLEL_BULKDEL   1 << 1 # bulkdelete can be done in
> > > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> > > > > > flag)
> > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP  1 << 2 # vacuumcleanup can be
> > > > > > done in parallel if bulkdelete is not performed (Indexes nbtree, brin,
> > > > > > gin, gist,
> > > > > > spgist, bloom will set this flag)
> > > > > > VACUUM_OPTION_PARALLEL_CLEANUP  1 << 3 # vacuumcleanup can be done in
> > > > > > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > > > > > and bloom will set this flag)
> > > > >
> > > > > I think gin and bloom don't need to set both but should set only
> > > > > VACUUM_OPTION_PARALLEL_CLEANUP.
> > > > >
> > > > > And I'm going to disallow index AMs to set both
> > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP and VACUUM_OPTION_PARALLEL_CLEANUP
> > > > > by assertions, is that okay?
> > > > >
> > > >
> > > > Sounds reasonable to me.
> > > >
> > > > Are you planning to include the changes related to I/O throttling
> > > > based on the discussion in the nearby thread [1]?  I think you can do
> > > > that if you agree with the conclusion in the last email[1], otherwise,
> > > > we can explore it separately.
> > >
> > > Yes I agreed. I'm going to include that changes in the next version
> > > patches. And I think we will be able to do more discussion based on
> > > the patch.
> > >
> >
> > I've attached the latest version patch set. The patch set includes all
> > discussed points regarding index AM options as well as shared cost
> > balance. Also I added some test cases used all types of index AM.
> >
> > During developments I had one concern about the number of parallel
> > workers to launch. In current design each index AMs can choose the
> > participation of parallel bulk-deletion and parallel cleanup. That
> > also means the number of parallel worker to launch might be different
> > for each time of parallel bulk-deletion and parallel cleanup. In
> > current patch the leader will always launch the number of indexes that
> > support either one but it would not be efficient in some cases. For
> > example, if we have 3 indexes supporting only parallel bulk-deletion
> > and 2 indexes supporting only parallel index cleanup, we would launch
> > 5 workers for each execution but some workers will do nothing at all.
> > To deal with this problem, I wonder if we can improve the parallel
> > query so that the leader process creates a parallel context with the
> > maximum number of indexes and can launch a part of workers instead of
> > all of them.
> >
> +
> + /* compute new balance by adding the local value */
> + shared_balance = pg_atomic_read_u32(VacuumSharedCostBalance);
> + new_balance = shared_balance + VacuumCostBalance;
>
> + /* also compute the total local balance */
> + local_balance = VacuumCostBalanceLocal + VacuumCostBalance;
> +
> + if ((new_balance >= VacuumCostLimit) &&
> + (local_balance > 0.5 * (VacuumCostLimit / nworkers)))
> + {
> + /* compute sleep time based on the local cost balance */
> + msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit;
> + new_balance = shared_balance - VacuumCostBalanceLocal;
> + VacuumCostBalanceLocal = 0;
> + }
> +
> + if (pg_atomic_compare_exchange_u32(VacuumSharedCostBalance,
> +    &shared_balance,
> +    new_balance))
> + {
> + /* Updated successfully, break */
> + break;
> + }
> While looking at the shared costing delay part, I have noticed that
> while checking the delay condition, we are considering local_balance
> which is VacuumCostBalanceLocal + VacuumCostBalance, but while
> computing the new balance we only reduce shared balance by
> VacuumCostBalanceLocal,  I think it should be reduced with
> local_balance?  I see that later we are adding VacuumCostBalance to
> the VacuumCostBalanceLocal so we are not loosing accounting for this
> balance.  But, I feel it is not right that we compare based on one
> value and operate based on other. I think we can immediately set
> VacuumCostBalanceLocal += VacuumCostBalance before checking the
> condition.
>

+/*
+ * index_parallelvacuum_estimate - estimate shared memory for parallel vacuum
+ *
+ * Currently, we don't pass any information to the AM-specific estimator,
+ * so it can probably only return a constant.  In the future, we might need
+ * to pass more information.
+ */
+Size
+index_parallelvacuum_estimate(Relation indexRelation)
+{
+ Size nbytes;
+
+ RELATION_CHECKS;
+
+ /*
+ * If amestimateparallelvacuum is not provided, assume only
+ * IndexBulkDeleteResult is needed.
+ */
+ if (indexRelation->rd_indam->amestimateparallelvacuum != NULL)
+ {
+ nbytes = indexRelation->rd_indam->amestimateparallelvacuum();
+ Assert(nbytes >= MAXALIGN(sizeof(IndexBulkDeleteResult)));
+ }
+ else
+ nbytes = MAXALIGN(sizeof(IndexBulkDeleteResult));
+
+ return nbytes;
+}

In v33-0001-Add-index-AM-field-and-callback-for-parallel-ind patch,  I
am a bit doubtful about this kind of arrangement, where the code in
the "if" is always unreachable with the current AMs.  I am not sure
what is the best way to handle this, should we just drop the
amestimateparallelvacuum altogether?  Because currently, we are just
providing a size estimate function without a copy function,  even if
the in future some Am give an estimate about the size of the stats, we
can not directly memcpy the stat from the local memory to the shared
memory, we might then need a copy function also from the AM so that it
can flatten the stats and store in proper format?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Thu, 21 Nov 2019 at 13:25, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Nov 21, 2019 at 9:15 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Nov 21, 2019 at 6:53 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Wed, 20 Nov 2019 at 20:36, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Wed, Nov 20, 2019 at 4:04 PM Masahiko Sawada
> > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > > >
> > > > > On Wed, 20 Nov 2019 at 17:02, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada
> > > > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > > > > >
> > > > > > > I've attached the latest version patch set. The patch set includes all
> > > > > > > discussed points regarding index AM options as well as shared cost
> > > > > > > balance. Also I added some test cases used all types of index AM.
> > > > > > >
> > > > > > > During developments I had one concern about the number of parallel
> > > > > > > workers to launch. In current design each index AMs can choose the
> > > > > > > participation of parallel bulk-deletion and parallel cleanup. That
> > > > > > > also means the number of parallel worker to launch might be different
> > > > > > > for each time of parallel bulk-deletion and parallel cleanup. In
> > > > > > > current patch the leader will always launch the number of indexes that
> > > > > > > support either one but it would not be efficient in some cases. For
> > > > > > > example, if we have 3 indexes supporting only parallel bulk-deletion
> > > > > > > and 2 indexes supporting only parallel index cleanup, we would launch
> > > > > > > 5 workers for each execution but some workers will do nothing at all.
> > > > > > > To deal with this problem, I wonder if we can improve the parallel
> > > > > > > query so that the leader process creates a parallel context with the
> > > > > > > maximum number of indexes and can launch a part of workers instead of
> > > > > > > all of them.
> > > > > > >
> > > > > >
> > > > > > Can't we choose the number of workers as a maximum of
> > > > > > "num_of_indexes_that_support_bulk_del" and
> > > > > > "num_of_indexes_that_support_cleanup"?  If we can do that, then we can
> > > > > > always launch the required number of workers for each phase (bulk_del,
> > > > > > cleanup).  In your above example, it should choose 3 workers while
> > > > > > creating a parallel context.  Do you see any problem with that?
> > > > >
> > > > > I might be missing something but if we create the parallel context
> > > > > with 3 workers the leader process always launches 3 workers. Therefore
> > > > > in the above case it launches 3 workers even in cleanup although 2
> > > > > workers is enough.
> > > > >
> > > >
> > > > Right, so we can either extend parallel API to launch fewer workers
> > > > than it has in parallel context as suggested by you or we can use
> > > > separate parallel context for each phase.  Going with the earlier has
> > > > the benefit that we don't need to recreate the parallel context and
> > > > the latter has the advantage that we won't keep additional shared
> > > > memory allocated.
> > >
> > > I also thought to use separate parallel contexts for each phase but
> > > can the same DSM be used by parallel workers  who initiated from
> > > different parallel contexts? If not I think that doesn't work because
> > > the parallel vacuum needs to set data to DSM of ambulkdelete and then
> > > parallel workers for amvacuumcleanup needs to access it.
> > >
> >
> > We can probably copy the stats in local memory instead of pointing it
> > to dsm after bulk-deletion, but I think that would unnecessary
> > overhead and doesn't sound like a good idea.

Right.

>
> I agree that it will be unnecessary overhead.
>
> >
> > > >  BTW, what kind of API change you have in mind for
> > > > the approach you are suggesting?
> > >
> > > I was thinking to add a new API, say LaunchParallelNWorkers(pcxt, n),
> > > where n is the number of workers the caller wants to launch and should
> > > be lower than the value in the parallel context.
> > >
> >
> > For that won't you need to duplicate most of the code of
> > LaunchParallelWorkers or maybe move the entire code in
> > LaunchParallelNWorkers and then LaunchParallelWorkers can also call
> > it.  Another idea could be to just extend the existing API
> > LaunchParallelWorkers to take input parameter as the number of
> > workers, do you see any problem with that or is there a reason you
> > prefer to write a new API for this?
>

Yeah, passing an extra parameter to LaunchParallelWorkers seems to be
a good idea. I just thought that the current API is also reasonable
because the caller of LaunchParallelWorkers doesn't need to care about
the number of workers, which is helpful for some cases, for example,
where the caller of CreateParallelContext and the caller of
LaunchParallelWorker are in different components. However it's not be
a problem since as far as I can see the current code there is no such
designed feature (these functions are called in the same function).

> I think we can pass an extra parameter to LaunchParallelWorkers
> therein we can try to launch min(pcxt->nworkers, n).  Or we can put an
> assert (n <= pcxt->nworkers).

I prefer to use min(pcxt->nworkers, n).

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Thu, 21 Nov 2019 at 14:16, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Mon, 18 Nov 2019 at 15:38, Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Mon, 18 Nov 2019 at 15:34, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Mon, Nov 18, 2019 at 11:37 AM Masahiko Sawada
> > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > > >
> > > > > On Wed, 13 Nov 2019 at 14:31, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > >
> > > > > > Based on these needs, we came up with a way to allow users to specify
> > > > > > this information for IndexAm's. Basically, Indexam will expose a
> > > > > > variable amparallelvacuumoptions which can have below options
> > > > > >
> > > > > > VACUUM_OPTION_NO_PARALLEL   1 << 0 # vacuum (neither bulkdelete nor
> > > > > > vacuumcleanup) can't be performed in parallel
> > > > >
> > > > > I think VACUUM_OPTION_NO_PARALLEL can be 0 so that index AMs who don't
> > > > > want to support parallel vacuum don't have to set anything.
> > > > >
> > > >
> > > > make sense.
> > > >
> > > > > > VACUUM_OPTION_PARALLEL_BULKDEL   1 << 1 # bulkdelete can be done in
> > > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> > > > > > flag)
> > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP  1 << 2 # vacuumcleanup can be
> > > > > > done in parallel if bulkdelete is not performed (Indexes nbtree, brin,
> > > > > > gin, gist,
> > > > > > spgist, bloom will set this flag)
> > > > > > VACUUM_OPTION_PARALLEL_CLEANUP  1 << 3 # vacuumcleanup can be done in
> > > > > > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > > > > > and bloom will set this flag)
> > > > >
> > > > > I think gin and bloom don't need to set both but should set only
> > > > > VACUUM_OPTION_PARALLEL_CLEANUP.
> > > > >
> > > > > And I'm going to disallow index AMs to set both
> > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP and VACUUM_OPTION_PARALLEL_CLEANUP
> > > > > by assertions, is that okay?
> > > > >
> > > >
> > > > Sounds reasonable to me.
> > > >
> > > > Are you planning to include the changes related to I/O throttling
> > > > based on the discussion in the nearby thread [1]?  I think you can do
> > > > that if you agree with the conclusion in the last email[1], otherwise,
> > > > we can explore it separately.
> > >
> > > Yes I agreed. I'm going to include that changes in the next version
> > > patches. And I think we will be able to do more discussion based on
> > > the patch.
> > >
> >
> > I've attached the latest version patch set. The patch set includes all
> > discussed points regarding index AM options as well as shared cost
> > balance. Also I added some test cases used all types of index AM.
> >
> > During developments I had one concern about the number of parallel
> > workers to launch. In current design each index AMs can choose the
> > participation of parallel bulk-deletion and parallel cleanup. That
> > also means the number of parallel worker to launch might be different
> > for each time of parallel bulk-deletion and parallel cleanup. In
> > current patch the leader will always launch the number of indexes that
> > support either one but it would not be efficient in some cases. For
> > example, if we have 3 indexes supporting only parallel bulk-deletion
> > and 2 indexes supporting only parallel index cleanup, we would launch
> > 5 workers for each execution but some workers will do nothing at all.
> > To deal with this problem, I wonder if we can improve the parallel
> > query so that the leader process creates a parallel context with the
> > maximum number of indexes and can launch a part of workers instead of
> > all of them.
> >
> +
> + /* compute new balance by adding the local value */
> + shared_balance = pg_atomic_read_u32(VacuumSharedCostBalance);
> + new_balance = shared_balance + VacuumCostBalance;
>
> + /* also compute the total local balance */
> + local_balance = VacuumCostBalanceLocal + VacuumCostBalance;
> +
> + if ((new_balance >= VacuumCostLimit) &&
> + (local_balance > 0.5 * (VacuumCostLimit / nworkers)))
> + {
> + /* compute sleep time based on the local cost balance */
> + msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit;
> + new_balance = shared_balance - VacuumCostBalanceLocal;
> + VacuumCostBalanceLocal = 0;
> + }
> +
> + if (pg_atomic_compare_exchange_u32(VacuumSharedCostBalance,
> +    &shared_balance,
> +    new_balance))
> + {
> + /* Updated successfully, break */
> + break;
> + }
> While looking at the shared costing delay part, I have noticed that
> while checking the delay condition, we are considering local_balance
> which is VacuumCostBalanceLocal + VacuumCostBalance, but while
> computing the new balance we only reduce shared balance by
> VacuumCostBalanceLocal,  I think it should be reduced with
> local_balance?

 Right.

> I see that later we are adding VacuumCostBalance to
> the VacuumCostBalanceLocal so we are not loosing accounting for this
> balance.  But, I feel it is not right that we compare based on one
> value and operate based on other. I think we can immediately set
> VacuumCostBalanceLocal += VacuumCostBalance before checking the
> condition.

I think we should not do VacuumCostBalanceLocal += VacuumCostBalance
inside the while loop because it's repeatedly executed until CAS
operation succeeds. Instead we can move it before the loop and remove
local_balance? The code would be like the following:

if (VacuumSharedCostBalance != NULL)
{
  :
  VacuumCostBalanceLocal += VacuumCostBalance;
  :
  /* Update the shared cost balance value atomically */
  while (true)
  {
      uint32 shared_balance;
      uint32 new_balance;

      msec = 0;

      /* compute new balance by adding the local value */
      shared_balance = pg_atomic_read_u32(VacuumSharedCostBalance);
      new_balance = shared_balance + VacuumCostBalance;

      if ((new_balance >= VacuumCostLimit) &&
          (VacuumCostBalanceLocal > 0.5 * (VacuumCostLimit / nworkers)))
      {
          /* compute sleep time based on the local cost balance */
          msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit;
          new_balance = shared_balance - VacuumCostBalanceLocal;
          VacuumCostBalanceLocal = 0;
      }

      if (pg_atomic_compare_exchange_u32(VacuumSharedCostBalance,
                                         &shared_balance,
                                         new_balance))
      {
          /* Updated successfully, break */
          break;
      }
  }

   :
 VacuumCostBalance = 0;
}

Thoughts?

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Thu, 21 Nov 2019 at 14:32, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Nov 21, 2019 at 10:46 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Mon, 18 Nov 2019 at 15:38, Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > On Mon, 18 Nov 2019 at 15:34, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > On Mon, Nov 18, 2019 at 11:37 AM Masahiko Sawada
> > > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > > > >
> > > > > > On Wed, 13 Nov 2019 at 14:31, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > > >
> > > > > > >
> > > > > > > Based on these needs, we came up with a way to allow users to specify
> > > > > > > this information for IndexAm's. Basically, Indexam will expose a
> > > > > > > variable amparallelvacuumoptions which can have below options
> > > > > > >
> > > > > > > VACUUM_OPTION_NO_PARALLEL   1 << 0 # vacuum (neither bulkdelete nor
> > > > > > > vacuumcleanup) can't be performed in parallel
> > > > > >
> > > > > > I think VACUUM_OPTION_NO_PARALLEL can be 0 so that index AMs who don't
> > > > > > want to support parallel vacuum don't have to set anything.
> > > > > >
> > > > >
> > > > > make sense.
> > > > >
> > > > > > > VACUUM_OPTION_PARALLEL_BULKDEL   1 << 1 # bulkdelete can be done in
> > > > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> > > > > > > flag)
> > > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP  1 << 2 # vacuumcleanup can be
> > > > > > > done in parallel if bulkdelete is not performed (Indexes nbtree, brin,
> > > > > > > gin, gist,
> > > > > > > spgist, bloom will set this flag)
> > > > > > > VACUUM_OPTION_PARALLEL_CLEANUP  1 << 3 # vacuumcleanup can be done in
> > > > > > > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > > > > > > and bloom will set this flag)
> > > > > >
> > > > > > I think gin and bloom don't need to set both but should set only
> > > > > > VACUUM_OPTION_PARALLEL_CLEANUP.
> > > > > >
> > > > > > And I'm going to disallow index AMs to set both
> > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP and VACUUM_OPTION_PARALLEL_CLEANUP
> > > > > > by assertions, is that okay?
> > > > > >
> > > > >
> > > > > Sounds reasonable to me.
> > > > >
> > > > > Are you planning to include the changes related to I/O throttling
> > > > > based on the discussion in the nearby thread [1]?  I think you can do
> > > > > that if you agree with the conclusion in the last email[1], otherwise,
> > > > > we can explore it separately.
> > > >
> > > > Yes I agreed. I'm going to include that changes in the next version
> > > > patches. And I think we will be able to do more discussion based on
> > > > the patch.
> > > >
> > >
> > > I've attached the latest version patch set. The patch set includes all
> > > discussed points regarding index AM options as well as shared cost
> > > balance. Also I added some test cases used all types of index AM.
> > >
> > > During developments I had one concern about the number of parallel
> > > workers to launch. In current design each index AMs can choose the
> > > participation of parallel bulk-deletion and parallel cleanup. That
> > > also means the number of parallel worker to launch might be different
> > > for each time of parallel bulk-deletion and parallel cleanup. In
> > > current patch the leader will always launch the number of indexes that
> > > support either one but it would not be efficient in some cases. For
> > > example, if we have 3 indexes supporting only parallel bulk-deletion
> > > and 2 indexes supporting only parallel index cleanup, we would launch
> > > 5 workers for each execution but some workers will do nothing at all.
> > > To deal with this problem, I wonder if we can improve the parallel
> > > query so that the leader process creates a parallel context with the
> > > maximum number of indexes and can launch a part of workers instead of
> > > all of them.
> > >
> > +
> > + /* compute new balance by adding the local value */
> > + shared_balance = pg_atomic_read_u32(VacuumSharedCostBalance);
> > + new_balance = shared_balance + VacuumCostBalance;
> >
> > + /* also compute the total local balance */
> > + local_balance = VacuumCostBalanceLocal + VacuumCostBalance;
> > +
> > + if ((new_balance >= VacuumCostLimit) &&
> > + (local_balance > 0.5 * (VacuumCostLimit / nworkers)))
> > + {
> > + /* compute sleep time based on the local cost balance */
> > + msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit;
> > + new_balance = shared_balance - VacuumCostBalanceLocal;
> > + VacuumCostBalanceLocal = 0;
> > + }
> > +
> > + if (pg_atomic_compare_exchange_u32(VacuumSharedCostBalance,
> > +    &shared_balance,
> > +    new_balance))
> > + {
> > + /* Updated successfully, break */
> > + break;
> > + }
> > While looking at the shared costing delay part, I have noticed that
> > while checking the delay condition, we are considering local_balance
> > which is VacuumCostBalanceLocal + VacuumCostBalance, but while
> > computing the new balance we only reduce shared balance by
> > VacuumCostBalanceLocal,  I think it should be reduced with
> > local_balance?  I see that later we are adding VacuumCostBalance to
> > the VacuumCostBalanceLocal so we are not loosing accounting for this
> > balance.  But, I feel it is not right that we compare based on one
> > value and operate based on other. I think we can immediately set
> > VacuumCostBalanceLocal += VacuumCostBalance before checking the
> > condition.
> >
>
> +/*
> + * index_parallelvacuum_estimate - estimate shared memory for parallel vacuum
> + *
> + * Currently, we don't pass any information to the AM-specific estimator,
> + * so it can probably only return a constant.  In the future, we might need
> + * to pass more information.
> + */
> +Size
> +index_parallelvacuum_estimate(Relation indexRelation)
> +{
> + Size nbytes;
> +
> + RELATION_CHECKS;
> +
> + /*
> + * If amestimateparallelvacuum is not provided, assume only
> + * IndexBulkDeleteResult is needed.
> + */
> + if (indexRelation->rd_indam->amestimateparallelvacuum != NULL)
> + {
> + nbytes = indexRelation->rd_indam->amestimateparallelvacuum();
> + Assert(nbytes >= MAXALIGN(sizeof(IndexBulkDeleteResult)));
> + }
> + else
> + nbytes = MAXALIGN(sizeof(IndexBulkDeleteResult));
> +
> + return nbytes;
> +}
>
> In v33-0001-Add-index-AM-field-and-callback-for-parallel-ind patch,  I
> am a bit doubtful about this kind of arrangement, where the code in
> the "if" is always unreachable with the current AMs.  I am not sure
> what is the best way to handle this, should we just drop the
> amestimateparallelvacuum altogether?

IIUC the motivation of amestimateparallelvacuum is for third party
index AM. If it allocates memory more than IndexBulkDeleteResult like
the current gist indexes (although we'll change it) it will break
index statistics of other indexes or even can be cause of crash. I'm
not sure there is such third party index AMs and it's true that all
index AMs in postgres code will not use this callback as you
mentioned, but I think we need to take care of it because such usage
is still possible.

> Because currently, we are just
> providing a size estimate function without a copy function,  even if
> the in future some Am give an estimate about the size of the stats, we
> can not directly memcpy the stat from the local memory to the shared
> memory, we might then need a copy function also from the AM so that it
> can flatten the stats and store in proper format?

I might be missing something but why can't we copy the stats from the
local memory to the DSM without the callback for copying stats? The
lazy vacuum code will get the pointer of the stats that are allocated
by index AM and the code can know the size of it. So I think we can
just memcpy to DSM.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:

On Thu, 21 Nov 2019, 13:52 Masahiko Sawada, <masahiko.sawada@2ndquadrant.com> wrote:
On Thu, 21 Nov 2019 at 14:16, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Mon, 18 Nov 2019 at 15:38, Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Mon, 18 Nov 2019 at 15:34, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Mon, Nov 18, 2019 at 11:37 AM Masahiko Sawada
> > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > > >
> > > > > On Wed, 13 Nov 2019 at 14:31, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > >
> > > > > > Based on these needs, we came up with a way to allow users to specify
> > > > > > this information for IndexAm's. Basically, Indexam will expose a
> > > > > > variable amparallelvacuumoptions which can have below options
> > > > > >
> > > > > > VACUUM_OPTION_NO_PARALLEL   1 << 0 # vacuum (neither bulkdelete nor
> > > > > > vacuumcleanup) can't be performed in parallel
> > > > >
> > > > > I think VACUUM_OPTION_NO_PARALLEL can be 0 so that index AMs who don't
> > > > > want to support parallel vacuum don't have to set anything.
> > > > >
> > > >
> > > > make sense.
> > > >
> > > > > > VACUUM_OPTION_PARALLEL_BULKDEL   1 << 1 # bulkdelete can be done in
> > > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> > > > > > flag)
> > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP  1 << 2 # vacuumcleanup can be
> > > > > > done in parallel if bulkdelete is not performed (Indexes nbtree, brin,
> > > > > > gin, gist,
> > > > > > spgist, bloom will set this flag)
> > > > > > VACUUM_OPTION_PARALLEL_CLEANUP  1 << 3 # vacuumcleanup can be done in
> > > > > > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > > > > > and bloom will set this flag)
> > > > >
> > > > > I think gin and bloom don't need to set both but should set only
> > > > > VACUUM_OPTION_PARALLEL_CLEANUP.
> > > > >
> > > > > And I'm going to disallow index AMs to set both
> > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP and VACUUM_OPTION_PARALLEL_CLEANUP
> > > > > by assertions, is that okay?
> > > > >
> > > >
> > > > Sounds reasonable to me.
> > > >
> > > > Are you planning to include the changes related to I/O throttling
> > > > based on the discussion in the nearby thread [1]?  I think you can do
> > > > that if you agree with the conclusion in the last email[1], otherwise,
> > > > we can explore it separately.
> > >
> > > Yes I agreed. I'm going to include that changes in the next version
> > > patches. And I think we will be able to do more discussion based on
> > > the patch.
> > >
> >
> > I've attached the latest version patch set. The patch set includes all
> > discussed points regarding index AM options as well as shared cost
> > balance. Also I added some test cases used all types of index AM.
> >
> > During developments I had one concern about the number of parallel
> > workers to launch. In current design each index AMs can choose the
> > participation of parallel bulk-deletion and parallel cleanup. That
> > also means the number of parallel worker to launch might be different
> > for each time of parallel bulk-deletion and parallel cleanup. In
> > current patch the leader will always launch the number of indexes that
> > support either one but it would not be efficient in some cases. For
> > example, if we have 3 indexes supporting only parallel bulk-deletion
> > and 2 indexes supporting only parallel index cleanup, we would launch
> > 5 workers for each execution but some workers will do nothing at all.
> > To deal with this problem, I wonder if we can improve the parallel
> > query so that the leader process creates a parallel context with the
> > maximum number of indexes and can launch a part of workers instead of
> > all of them.
> >
> +
> + /* compute new balance by adding the local value */
> + shared_balance = pg_atomic_read_u32(VacuumSharedCostBalance);
> + new_balance = shared_balance + VacuumCostBalance;
>
> + /* also compute the total local balance */
> + local_balance = VacuumCostBalanceLocal + VacuumCostBalance;
> +
> + if ((new_balance >= VacuumCostLimit) &&
> + (local_balance > 0.5 * (VacuumCostLimit / nworkers)))
> + {
> + /* compute sleep time based on the local cost balance */
> + msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit;
> + new_balance = shared_balance - VacuumCostBalanceLocal;
> + VacuumCostBalanceLocal = 0;
> + }
> +
> + if (pg_atomic_compare_exchange_u32(VacuumSharedCostBalance,
> +    &shared_balance,
> +    new_balance))
> + {
> + /* Updated successfully, break */
> + break;
> + }
> While looking at the shared costing delay part, I have noticed that
> while checking the delay condition, we are considering local_balance
> which is VacuumCostBalanceLocal + VacuumCostBalance, but while
> computing the new balance we only reduce shared balance by
> VacuumCostBalanceLocal,  I think it should be reduced with
> local_balance?

 Right.

> I see that later we are adding VacuumCostBalance to
> the VacuumCostBalanceLocal so we are not loosing accounting for this
> balance.  But, I feel it is not right that we compare based on one
> value and operate based on other. I think we can immediately set
> VacuumCostBalanceLocal += VacuumCostBalance before checking the
> condition.

I think we should not do VacuumCostBalanceLocal += VacuumCostBalance
inside the while loop because it's repeatedly executed until CAS
operation succeeds. Instead we can move it before the loop and remove
local_balance?

Right, I meant before loop.
The code would be like the following:

if (VacuumSharedCostBalance != NULL)
{
  :
  VacuumCostBalanceLocal += VacuumCostBalance;
  :
  /* Update the shared cost balance value atomically */
  while (true)
  {
      uint32 shared_balance;
      uint32 new_balance;

      msec = 0;

      /* compute new balance by adding the local value */
      shared_balance = pg_atomic_read_u32(VacuumSharedCostBalance);
      new_balance = shared_balance + VacuumCostBalance;

      if ((new_balance >= VacuumCostLimit) &&
          (VacuumCostBalanceLocal > 0.5 * (VacuumCostLimit / nworkers)))
      {
          /* compute sleep time based on the local cost balance */
          msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit;
          new_balance = shared_balance - VacuumCostBalanceLocal;
          VacuumCostBalanceLocal = 0;
      }

      if (pg_atomic_compare_exchange_u32(VacuumSharedCostBalance,
                                         &shared_balance,
                                         new_balance))
      {
          /* Updated successfully, break */
          break;
      }
  }

   :
 VacuumCostBalance = 0;
}

Thoughts?

Looks fine to me.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Thu, 21 Nov 2019, 14:15 Masahiko Sawada, <masahiko.sawada@2ndquadrant.com> wrote:
On Thu, 21 Nov 2019 at 14:32, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Nov 21, 2019 at 10:46 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Mon, 18 Nov 2019 at 15:38, Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > On Mon, 18 Nov 2019 at 15:34, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > On Mon, Nov 18, 2019 at 11:37 AM Masahiko Sawada
> > > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > > > >
> > > > > > On Wed, 13 Nov 2019 at 14:31, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > > >
> > > > > > >
> > > > > > > Based on these needs, we came up with a way to allow users to specify
> > > > > > > this information for IndexAm's. Basically, Indexam will expose a
> > > > > > > variable amparallelvacuumoptions which can have below options
> > > > > > >
> > > > > > > VACUUM_OPTION_NO_PARALLEL   1 << 0 # vacuum (neither bulkdelete nor
> > > > > > > vacuumcleanup) can't be performed in parallel
> > > > > >
> > > > > > I think VACUUM_OPTION_NO_PARALLEL can be 0 so that index AMs who don't
> > > > > > want to support parallel vacuum don't have to set anything.
> > > > > >
> > > > >
> > > > > make sense.
> > > > >
> > > > > > > VACUUM_OPTION_PARALLEL_BULKDEL   1 << 1 # bulkdelete can be done in
> > > > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> > > > > > > flag)
> > > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP  1 << 2 # vacuumcleanup can be
> > > > > > > done in parallel if bulkdelete is not performed (Indexes nbtree, brin,
> > > > > > > gin, gist,
> > > > > > > spgist, bloom will set this flag)
> > > > > > > VACUUM_OPTION_PARALLEL_CLEANUP  1 << 3 # vacuumcleanup can be done in
> > > > > > > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > > > > > > and bloom will set this flag)
> > > > > >
> > > > > > I think gin and bloom don't need to set both but should set only
> > > > > > VACUUM_OPTION_PARALLEL_CLEANUP.
> > > > > >
> > > > > > And I'm going to disallow index AMs to set both
> > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP and VACUUM_OPTION_PARALLEL_CLEANUP
> > > > > > by assertions, is that okay?
> > > > > >
> > > > >
> > > > > Sounds reasonable to me.
> > > > >
> > > > > Are you planning to include the changes related to I/O throttling
> > > > > based on the discussion in the nearby thread [1]?  I think you can do
> > > > > that if you agree with the conclusion in the last email[1], otherwise,
> > > > > we can explore it separately.
> > > >
> > > > Yes I agreed. I'm going to include that changes in the next version
> > > > patches. And I think we will be able to do more discussion based on
> > > > the patch.
> > > >
> > >
> > > I've attached the latest version patch set. The patch set includes all
> > > discussed points regarding index AM options as well as shared cost
> > > balance. Also I added some test cases used all types of index AM.
> > >
> > > During developments I had one concern about the number of parallel
> > > workers to launch. In current design each index AMs can choose the
> > > participation of parallel bulk-deletion and parallel cleanup. That
> > > also means the number of parallel worker to launch might be different
> > > for each time of parallel bulk-deletion and parallel cleanup. In
> > > current patch the leader will always launch the number of indexes that
> > > support either one but it would not be efficient in some cases. For
> > > example, if we have 3 indexes supporting only parallel bulk-deletion
> > > and 2 indexes supporting only parallel index cleanup, we would launch
> > > 5 workers for each execution but some workers will do nothing at all.
> > > To deal with this problem, I wonder if we can improve the parallel
> > > query so that the leader process creates a parallel context with the
> > > maximum number of indexes and can launch a part of workers instead of
> > > all of them.
> > >
> > +
> > + /* compute new balance by adding the local value */
> > + shared_balance = pg_atomic_read_u32(VacuumSharedCostBalance);
> > + new_balance = shared_balance + VacuumCostBalance;
> >
> > + /* also compute the total local balance */
> > + local_balance = VacuumCostBalanceLocal + VacuumCostBalance;
> > +
> > + if ((new_balance >= VacuumCostLimit) &&
> > + (local_balance > 0.5 * (VacuumCostLimit / nworkers)))
> > + {
> > + /* compute sleep time based on the local cost balance */
> > + msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit;
> > + new_balance = shared_balance - VacuumCostBalanceLocal;
> > + VacuumCostBalanceLocal = 0;
> > + }
> > +
> > + if (pg_atomic_compare_exchange_u32(VacuumSharedCostBalance,
> > +    &shared_balance,
> > +    new_balance))
> > + {
> > + /* Updated successfully, break */
> > + break;
> > + }
> > While looking at the shared costing delay part, I have noticed that
> > while checking the delay condition, we are considering local_balance
> > which is VacuumCostBalanceLocal + VacuumCostBalance, but while
> > computing the new balance we only reduce shared balance by
> > VacuumCostBalanceLocal,  I think it should be reduced with
> > local_balance?  I see that later we are adding VacuumCostBalance to
> > the VacuumCostBalanceLocal so we are not loosing accounting for this
> > balance.  But, I feel it is not right that we compare based on one
> > value and operate based on other. I think we can immediately set
> > VacuumCostBalanceLocal += VacuumCostBalance before checking the
> > condition.
> >
>
> +/*
> + * index_parallelvacuum_estimate - estimate shared memory for parallel vacuum
> + *
> + * Currently, we don't pass any information to the AM-specific estimator,
> + * so it can probably only return a constant.  In the future, we might need
> + * to pass more information.
> + */
> +Size
> +index_parallelvacuum_estimate(Relation indexRelation)
> +{
> + Size nbytes;
> +
> + RELATION_CHECKS;
> +
> + /*
> + * If amestimateparallelvacuum is not provided, assume only
> + * IndexBulkDeleteResult is needed.
> + */
> + if (indexRelation->rd_indam->amestimateparallelvacuum != NULL)
> + {
> + nbytes = indexRelation->rd_indam->amestimateparallelvacuum();
> + Assert(nbytes >= MAXALIGN(sizeof(IndexBulkDeleteResult)));
> + }
> + else
> + nbytes = MAXALIGN(sizeof(IndexBulkDeleteResult));
> +
> + return nbytes;
> +}
>
> In v33-0001-Add-index-AM-field-and-callback-for-parallel-ind patch,  I
> am a bit doubtful about this kind of arrangement, where the code in
> the "if" is always unreachable with the current AMs.  I am not sure
> what is the best way to handle this, should we just drop the
> amestimateparallelvacuum altogether?

IIUC the motivation of amestimateparallelvacuum is for third party
index AM. If it allocates memory more than IndexBulkDeleteResult like
the current gist indexes (although we'll change it) it will break
index statistics of other indexes or even can be cause of crash. I'm
not sure there is such third party index AMs and it's true that all
index AMs in postgres code will not use this callback as you
mentioned, but I think we need to take care of it because such usage
is still possible.

> Because currently, we are just
> providing a size estimate function without a copy function,  even if
> the in future some Am give an estimate about the size of the stats, we
> can not directly memcpy the stat from the local memory to the shared
> memory, we might then need a copy function also from the AM so that it
> can flatten the stats and store in proper format?

I might be missing something but why can't we copy the stats from the
local memory to the DSM without the callback for copying stats? The
lazy vacuum code will get the pointer of the stats that are allocated
by index AM and the code can know the size of it. So I think we can
just memcpy to DSM.

Oh sure.  But, what I meant is that if AM may keep pointers in its stats as GistBulkDeleteResult do so we might not be able to copy directly outside the AM.  So I thought that if we have a call back for the copy then the AM can flatten the stats such that IndexBulkDeleteResult, followed by AM specific stats.  Yeah but someone may argue that we might force the AM to return the stats in a form that it can be memcpy directly.  So I think I am fine with the way it is.

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Thu, Nov 21, 2019 at 3:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, 21 Nov 2019, 14:15 Masahiko Sawada, <masahiko.sawada@2ndquadrant.com> wrote:
>>
>> On Thu, 21 Nov 2019 at 14:32, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>> >
>> >
>> > In v33-0001-Add-index-AM-field-and-callback-for-parallel-ind patch,  I
>> > am a bit doubtful about this kind of arrangement, where the code in
>> > the "if" is always unreachable with the current AMs.  I am not sure
>> > what is the best way to handle this, should we just drop the
>> > amestimateparallelvacuum altogether?
>>
>> IIUC the motivation of amestimateparallelvacuum is for third party
>> index AM. If it allocates memory more than IndexBulkDeleteResult like
>> the current gist indexes (although we'll change it) it will break
>> index statistics of other indexes or even can be cause of crash. I'm
>> not sure there is such third party index AMs and it's true that all
>> index AMs in postgres code will not use this callback as you
>> mentioned, but I think we need to take care of it because such usage
>> is still possible.
>>
>> > Because currently, we are just
>> > providing a size estimate function without a copy function,  even if
>> > the in future some Am give an estimate about the size of the stats, we
>> > can not directly memcpy the stat from the local memory to the shared
>> > memory, we might then need a copy function also from the AM so that it
>> > can flatten the stats and store in proper format?
>>
>> I might be missing something but why can't we copy the stats from the
>> local memory to the DSM without the callback for copying stats? The
>> lazy vacuum code will get the pointer of the stats that are allocated
>> by index AM and the code can know the size of it. So I think we can
>> just memcpy to DSM.
>
>
> Oh sure.  But, what I meant is that if AM may keep pointers in its stats as GistBulkDeleteResult do so we might not
beable to copy directly outside the AM.  So I thought that if we have a call back for the copy then the AM can flatten
thestats such that IndexBulkDeleteResult, followed by AM specific stats.  Yeah but someone may argue that we might
forcethe AM to return the stats in a form that it can be memcpy directly.  So I think I am fine with the way it is. 
>

I think we have discussed this point earlier as well and the
conclusion was to provide an API if there is a need for the same.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> I've attached the latest version patch set. The patch set includes all
> discussed points regarding index AM options as well as shared cost
> balance. Also I added some test cases used all types of index AM.
>

I have reviewed the first patch and made a number of modifications
that include adding/modifying comments, made some corrections and
modifications in the documentation. You can find my changes in
v33-0001-delta-amit.patch.  See, if those look okay to you, if so,
please include those in the next version of the patch.  I am attaching
both your version of patch and delta changes by me.

One comment on v33-0002-Add-parallel-option-to-VACUUM-command:

+ /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
+ est_shared = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN
(nindexes)));
..
+ shared->offset = add_size(SizeOfLVShared, BITMAPLEN(nindexes));

Here, don't you need to do MAXALIGN to set offset as we are computing
it that way while estimating shared memory?  If not, then probably,
some comments are required to explain it.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Fri, Nov 22, 2019 at 2:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > I've attached the latest version patch set. The patch set includes all
> > discussed points regarding index AM options as well as shared cost
> > balance. Also I added some test cases used all types of index AM.
> >
>
> I have reviewed the first patch and made a number of modifications
> that include adding/modifying comments, made some corrections and
> modifications in the documentation. You can find my changes in
> v33-0001-delta-amit.patch.
>

I have continued my review for this patch series and reviewed/hacked
the second patch.  I have added/modified comments, changed function
ordering in file to make them look consistent and a few other changes.
You can find my changes in v33-0002-delta-amit.patch.   Are you
working on review comments given recently, if you have not started
yet, then it might be better to prepare a patch atop of v33 version as
I am also going to work on this patch series, that way it will be easy
to merge changes.  OTOH, if you are already working on those, then it
is fine.  I can merge any remaining changes with your new patch.
Whatever be the case, please let me know.

Few more comments on v33-0002-Add-parallel-option-to-VACUUM-command.patch:

---------------------------------------------------------------------------------------------------------------------------
1.
+ * leader process re-initializes the parallel context while keeping recorded
+ * dead tuples so that the leader can launch parallel workers again in the next
+ * time.

In this sentence, it is not clear to me why we need to keep the
recorded dead tuples while re-initialize parallel workers?  The next
time when workers are launched, they should process a new set of dead
tuples, no?

2.
lazy_parallel_vacuum_or_cleanup_indexes()
{
..
+ /*
+ * Increment the active worker count. We cannot decrement until the
+ * all parallel workers finish.
+ */
+
pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ /*
+ * Join as parallel workers. The leader process alone does that in
+ * case where
no workers launched.
+ */
+ if (lps->leaderparticipates || lps->pcxt->nworkers_launched == 0)
+ vacuum_or_cleanup_indexes_worker
(Irel, nindexes, stats, lps->lvshared,
+ vacrelstats->dead_tuples);
+
+ /*
+
 * Here, the indexes that had been skipped during parallel index vacuuming
+ * are remaining. If there are such indexes the leader process does
vacuum
+ * or cleanup them one by one.
+ */
+ nindexes_remains = nindexes -
pg_atomic_read_u32(&(lps->lvshared->nprocessed));
+ if
(nindexes_remains > 0)
+ {
+ int i;
+#ifdef USE_ASSERT_CHECKING
+ int nprocessed = 0;
+#endif
+
+ for (i = 0; i <
nindexes; i++)
+ {
+ bool processed = !skip_parallel_index_vacuum(Irel[i],
+
lps->lvshared->for_cleanup,
+
lps->lvshared->first_time);
+
+ /* Skip the already processed indexes */
+
if (processed)
+ continue;
+
+ if (lps->lvshared->for_cleanup)
+
lazy_cleanup_index(Irel[i], &stats[i],
+    vacrelstats->new_rel_tuples,
+
   vacrelstats->tupcount_pages < vacrelstats->rel_pages);
+ else
+
lazy_vacuum_index(Irel[i], &stats[i], vacrelstats->dead_tuples,
+   vacrelstats-
>old_live_tuples);
+#ifdef USE_ASSERT_CHECKING
+ nprocessed++;
+#endif
+ }
+#ifdef USE_ASSERT_CHECKING
+ Assert
(nprocessed == nindexes_remains);
+#endif
+ }
+
+ /*
+ * We have completed the index vacuum so decrement the active worker
+ * count.
+
 */
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
..
}

Here, it seems that we can increment/decrement the
VacuumActiveNWorkers even when there is no work performed by the
leader backend.  How about moving increment/decrement inside function
vacuum_or_cleanup_indexes_worker?  In that case, we need to do it in
this function when we are actually doing an index vacuum or cleanup.
After doing that the other usage of increment/decrement of
VacuumActiveNWorkers in other function heap_parallel_vacuum_main can
be removed.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Fri, 22 Nov 2019 at 10:19, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > I've attached the latest version patch set. The patch set includes all
> > discussed points regarding index AM options as well as shared cost
> > balance. Also I added some test cases used all types of index AM.
> >
>
> I have reviewed the first patch and made a number of modifications
> that include adding/modifying comments, made some corrections and
> modifications in the documentation. You can find my changes in
> v33-0001-delta-amit.patch.  See, if those look okay to you, if so,
> please include those in the next version of the patch.  I am attaching
> both your version of patch and delta changes by me.

Thank you.

All changes look good to me. But after changed the 0002 patch the two
macros for parallel vacuum options (VACUUM_OPTIONS_SUPPORT_XXX) is no
longer necessary. So we can remove them and can add if we need them
again.

>
> One comment on v33-0002-Add-parallel-option-to-VACUUM-command:
>
> + /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
> + est_shared = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN
> (nindexes)));
> ..
> + shared->offset = add_size(SizeOfLVShared, BITMAPLEN(nindexes));
>
> Here, don't you need to do MAXALIGN to set offset as we are computing
> it that way while estimating shared memory?  If not, then probably,
> some comments are required to explain it.

You're right. Will fix it.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Mon, Nov 25, 2019 at 9:42 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Fri, 22 Nov 2019 at 10:19, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > I've attached the latest version patch set. The patch set includes all
> > > discussed points regarding index AM options as well as shared cost
> > > balance. Also I added some test cases used all types of index AM.
> > >
> >
> > I have reviewed the first patch and made a number of modifications
> > that include adding/modifying comments, made some corrections and
> > modifications in the documentation. You can find my changes in
> > v33-0001-delta-amit.patch.  See, if those look okay to you, if so,
> > please include those in the next version of the patch.  I am attaching
> > both your version of patch and delta changes by me.
>
> Thank you.
>
> All changes look good to me. But after changed the 0002 patch the two
> macros for parallel vacuum options (VACUUM_OPTIONS_SUPPORT_XXX) is no
> longer necessary. So we can remove them and can add if we need them
> again.
>

Sounds reasonable.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Mon, Nov 25, 2019 at 5:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> 2.
> lazy_parallel_vacuum_or_cleanup_indexes()
> {
> ..
> ..
> }
>
> Here, it seems that we can increment/decrement the
> VacuumActiveNWorkers even when there is no work performed by the
> leader backend.  How about moving increment/decrement inside function
> vacuum_or_cleanup_indexes_worker?  In that case, we need to do it in
> this function when we are actually doing an index vacuum or cleanup.
> After doing that the other usage of increment/decrement of
> VacuumActiveNWorkers in other function heap_parallel_vacuum_main can
> be removed.
>

One of my colleague Mahendra who was testing this patch found that
stats for index reported by view pg_statio_all_tables are wrong for
parallel vacuum.  I debugged the issue and found that there were two
problems in the stats related code.
1. The function get_indstats seem to be computing the wrong value of
stats for the last index.
2. The function lazy_parallel_vacuum_or_cleanup_indexes() was not
pointing to the computed stats when the parallel index scan is
skipped.

Find the above two fixes in the attached patch.  This is on top of the
patches I sent yesterday [1].

Some more comments on v33-0002-Add-parallel-option-to-VACUUM-command
-------------------------------------------------------------------------------------------------------------
1.  The code in function lazy_parallel_vacuum_or_cleanup_indexes()
that processes the indexes that have skipped parallel processing can
be moved to a separate function.  Further, the newly added code by the
attached patch can also be moved to a separate function as the same
code is used in function vacuum_or_cleanup_indexes_worker().

2.
+void
+heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
{
..
+ stats = (IndexBulkDeleteResult **)
+ palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
..
}

It would be neat if we free this memory once it is used.

3.
+ /*
+ * Compute the number of indexes that can participate to parallel index
+ * vacuuming.
+ */

/to/in

4.  The function lazy_parallel_vacuum_or_cleanup_indexes() launches
workers without checking whether it needs to do the same or not.  For
ex. in cleanup phase, it is possible that we don't need to launch any
worker, so it will be waste.  It might be that you are already
planning to handle it based on the previous comments/discussion in
which case you can ignore this.


[1] - https://www.postgresql.org/message-id/CAA4eK1LQ%2BYGjmSS-XqhuAa6eb%3DXykpx1LiT7UXJHmEKP%3D0QtsA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, 26 Nov 2019 at 13:34, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Nov 25, 2019 at 5:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > 2.
> > lazy_parallel_vacuum_or_cleanup_indexes()
> > {
> > ..
> > ..
> > }
> >
> > Here, it seems that we can increment/decrement the
> > VacuumActiveNWorkers even when there is no work performed by the
> > leader backend.  How about moving increment/decrement inside function
> > vacuum_or_cleanup_indexes_worker?  In that case, we need to do it in
> > this function when we are actually doing an index vacuum or cleanup.
> > After doing that the other usage of increment/decrement of
> > VacuumActiveNWorkers in other function heap_parallel_vacuum_main can
> > be removed.

Yeah we can move it inside vacuum_or_cleanup_indexes_worker but we
still need to increment the count before processing the indexes that
have skipped parallel operations because some workers might still be
running yet.

> >
>
> One of my colleague Mahendra who was testing this patch found that
> stats for index reported by view pg_statio_all_tables are wrong for
> parallel vacuum.  I debugged the issue and found that there were two
> problems in the stats related code.
> 1. The function get_indstats seem to be computing the wrong value of
> stats for the last index.
> 2. The function lazy_parallel_vacuum_or_cleanup_indexes() was not
> pointing to the computed stats when the parallel index scan is
> skipped.
>
> Find the above two fixes in the attached patch.  This is on top of the
> patches I sent yesterday [1].

Thank you! During testing the current patch by myself I also found this bug.

>
> Some more comments on v33-0002-Add-parallel-option-to-VACUUM-command
> -------------------------------------------------------------------------------------------------------------
> 1.  The code in function lazy_parallel_vacuum_or_cleanup_indexes()
> that processes the indexes that have skipped parallel processing can
> be moved to a separate function.  Further, the newly added code by the
> attached patch can also be moved to a separate function as the same
> code is used in function vacuum_or_cleanup_indexes_worker().
>
> 2.
> +void
> +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
> {
> ..
> + stats = (IndexBulkDeleteResult **)
> + palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
> ..
> }
>
> It would be neat if we free this memory once it is used.
>
> 3.
> + /*
> + * Compute the number of indexes that can participate to parallel index
> + * vacuuming.
> + */
>
> /to/in
>
> 4.  The function lazy_parallel_vacuum_or_cleanup_indexes() launches
> workers without checking whether it needs to do the same or not.  For
> ex. in cleanup phase, it is possible that we don't need to launch any
> worker, so it will be waste.  It might be that you are already
> planning to handle it based on the previous comments/discussion in
> which case you can ignore this.

I've incorporated the comments I got so far including the above and
the memory alignment issue. Therefore the attached v34 patch includes
that changes and changes in v33-0002-delta-amit.patch and
v33-0002-delta2-fix-stats-issue.patch. In this version I add an extra
argument to LaunchParallelWorkers function and make the leader process
launch the parallel workers as much as the particular phase needs.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Wed, Nov 27, 2019 at 12:52 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
>
> I've incorporated the comments I got so far including the above and
> the memory alignment issue.
>

Thanks, I will look into the new version.  BTW, why haven't you posted
0001 patch (IndexAM API's patch)?  I think without that we need to use
the previous version for that. Also, I think we should post Dilip's
patch related to Gist index [1] modifications for parallel vacuum or
at least have a mention for that while posting a new version as
without that even make check fails.

[1] - https://www.postgresql.org/message-id/CAFiTN-uQY%2BB%2BCLb8W3YYdb7XmB9hyYFXkAy3C7RY%3D-YSWRV1DA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Wed, Nov 27, 2019 at 8:14 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Nov 27, 2019 at 12:52 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> >
> > I've incorporated the comments I got so far including the above and
> > the memory alignment issue.
> >
>
> Thanks, I will look into the new version.
>

Few comments:
-----------------------
1.
+static void
+vacuum_or_cleanup_indexes_worker(Relation *Irel, int nindexes,
+ IndexBulkDeleteResult **stats,
+ LVShared *lvshared,
+ LVDeadTuples *dead_tuples)
+{
+ /* Increment the active worker count */
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);

The above code is wrong because it is possible that this function is
called even when there are no workers in which case
VacuumActiveNWorkers will be NULL.

2.
+ /* Take over the shared balance value to heap scan */
+ VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);

We can carry over shared balance only if the same is active.

3.
+ if (Irel[i]->rd_indam->amparallelvacuumoptions ==
+ VACUUM_OPTION_NO_PARALLEL)
+ {
+
/* Set NULL as this index does not support parallel vacuum */
+ lvshared->bitmap[i >> 3] |= 0 << (i & 0x07);

Can we avoid setting this for each index by initializing bitmap as all
NULL's as is done in the attached patch?

4.
+ /*
+ * Variables to control parallel index vacuuming.  Index statistics
+ * returned from ambulkdelete and amvacuumcleanup is nullable
variable
+ * length.  'offset' is NULL bitmap. Note that a 0 indicates a null,
+ * while 1 indicates non-null.  The index statistics follows
at end of
+ * struct.
+ */

This comment is not clear, so I have re-worded it.  See, if the
changed comment makes sense.

I have fixed all the above issues, made a couple of other cosmetic
changes and modified a few comments.  See the changes in
v34-0002-delta-amit.  I am attaching just the delta patch on top of
v34-0002-Add-parallel-option-to-VACUUM-command.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh
Date:
On Wed, 27 Nov 2019 at 08:14, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Nov 27, 2019 at 12:52 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
>
> I've incorporated the comments I got so far including the above and
> the memory alignment issue.
>

Thanks, I will look into the new version.  BTW, why haven't you posted
0001 patch (IndexAM API's patch)?  I think without that we need to use
the previous version for that. Also, I think we should post Dilip's
patch related to Gist index [1] modifications for parallel vacuum or
at least have a mention for that while posting a new version as
without that even make check fails.

[1] - https://www.postgresql.org/message-id/CAFiTN-uQY%2BB%2BCLb8W3YYdb7XmB9hyYFXkAy3C7RY%3D-YSWRV1DA%40mail.gmail.com


I did some testing on the top of v33 patch set. By debugging, I was able to hit one assert in lazy_parallel_vacuum_or_cleanup_indexes.
TRAP: FailedAssertion("nprocessed == nindexes_remains", File: "vacuumlazy.c", Line: 2099)

I further debugged and found that this assert is not valid in all the cases. Here, nprocessed can be less than nindexes_remains in some cases because it is  possible that parallel worker is launched for vacuum and idx count is incremented in vacuum_or_cleanup_indexes_worker for particular index  but work is still not finished(lvshared->nprocessed is not incremented yet) so in that case, nprocessed will be less than nindexes_remains.  I think, we should remove this assert.

I have one comment for assert used variable:

+#ifdef USE_ASSERT_CHECKING
+ int nprocessed = 0;
+#endif

I think, we can make above declaration as " int nprocessed PG_USED_FOR_ASSERTS_ONLY = 0" so that code looks good because this USE_ASSERT_CHECKING is used in 3 places in 20-30 code lines.

Thanks and Regards
Mahendra Thalor

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Wed, 27 Nov 2019 at 13:26, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Nov 27, 2019 at 8:14 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Nov 27, 2019 at 12:52 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > >
> > > I've incorporated the comments I got so far including the above and
> > > the memory alignment issue.
> > >
> >
> > Thanks, I will look into the new version.
> >
>
> Few comments:
> -----------------------
> 1.
> +static void
> +vacuum_or_cleanup_indexes_worker(Relation *Irel, int nindexes,
> + IndexBulkDeleteResult **stats,
> + LVShared *lvshared,
> + LVDeadTuples *dead_tuples)
> +{
> + /* Increment the active worker count */
> + pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
>
> The above code is wrong because it is possible that this function is
> called even when there are no workers in which case
> VacuumActiveNWorkers will be NULL.
>
> 2.
> + /* Take over the shared balance value to heap scan */
> + VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
>
> We can carry over shared balance only if the same is active.
>
> 3.
> + if (Irel[i]->rd_indam->amparallelvacuumoptions ==
> + VACUUM_OPTION_NO_PARALLEL)
> + {
> +
> /* Set NULL as this index does not support parallel vacuum */
> + lvshared->bitmap[i >> 3] |= 0 << (i & 0x07);
>
> Can we avoid setting this for each index by initializing bitmap as all
> NULL's as is done in the attached patch?
>
> 4.
> + /*
> + * Variables to control parallel index vacuuming.  Index statistics
> + * returned from ambulkdelete and amvacuumcleanup is nullable
> variable
> + * length.  'offset' is NULL bitmap. Note that a 0 indicates a null,
> + * while 1 indicates non-null.  The index statistics follows
> at end of
> + * struct.
> + */
>
> This comment is not clear, so I have re-worded it.  See, if the
> changed comment makes sense.
>
> I have fixed all the above issues, made a couple of other cosmetic
> changes and modified a few comments.  See the changes in
> v34-0002-delta-amit.  I am attaching just the delta patch on top of
> v34-0002-Add-parallel-option-to-VACUUM-command.
>

Thank you for reviewing this patch. All changes you made looks good to me.

I thought I already have posted all v34 patches but didn't, sorry. So
I've attached v35 patch set that incorporated your changes and it
includes Dilip's patch for gist index (0001). These patches can be
applied on top of the current HEAD and make check should pass.
Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Wed, 27 Nov 2019 at 13:28, Mahendra Singh <mahi6run@gmail.com> wrote:
>
> On Wed, 27 Nov 2019 at 08:14, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Wed, Nov 27, 2019 at 12:52 AM Masahiko Sawada
>> <masahiko.sawada@2ndquadrant.com> wrote:
>> >
>> >
>> > I've incorporated the comments I got so far including the above and
>> > the memory alignment issue.
>> >
>>
>> Thanks, I will look into the new version.  BTW, why haven't you posted
>> 0001 patch (IndexAM API's patch)?  I think without that we need to use
>> the previous version for that. Also, I think we should post Dilip's
>> patch related to Gist index [1] modifications for parallel vacuum or
>> at least have a mention for that while posting a new version as
>> without that even make check fails.
>>
>> [1] -
https://www.postgresql.org/message-id/CAFiTN-uQY%2BB%2BCLb8W3YYdb7XmB9hyYFXkAy3C7RY%3D-YSWRV1DA%40mail.gmail.com
>>
>
> I did some testing on the top of v33 patch set. By debugging, I was able to hit one assert in
lazy_parallel_vacuum_or_cleanup_indexes.
> TRAP: FailedAssertion("nprocessed == nindexes_remains", File: "vacuumlazy.c", Line: 2099)
>
> I further debugged and found that this assert is not valid in all the cases. Here, nprocessed can be less than
nindexes_remainsin some cases because it is  possible that parallel worker is launched for vacuum and idx count is
incrementedin vacuum_or_cleanup_indexes_worker for particular index  but work is still not
finished(lvshared->nprocessedis not incremented yet) so in that case, nprocessed will be less than nindexes_remains.  I
think,we should remove this assert. 
>
> I have one comment for assert used variable:
>
> +#ifdef USE_ASSERT_CHECKING
> + int nprocessed = 0;
> +#endif
>
> I think, we can make above declaration as " int nprocessed PG_USED_FOR_ASSERTS_ONLY = 0" so that code looks good
becausethis USE_ASSERT_CHECKING is used in 3 places in 20-30 code lines. 

Thank you for testing!

Yes, I think your analysis is right. I've removed the assertion in v35
patch that I've just posted[1].

[1] https://www.postgresql.org/message-id/CA%2Bfd4k5oAuGuwZ9XaOTv%2BcTU8-dmA3RjpJ%2Bi4x5kt9VbAFse1w%40mail.gmail.com


Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh
Date:
On Wed, 27 Nov 2019 at 23:14, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
On Wed, 27 Nov 2019 at 13:26, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Nov 27, 2019 at 8:14 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Nov 27, 2019 at 12:52 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > >
> > > I've incorporated the comments I got so far including the above and
> > > the memory alignment issue.
> > >
> >
> > Thanks, I will look into the new version.
> >
>
> Few comments:
> -----------------------
> 1.
> +static void
> +vacuum_or_cleanup_indexes_worker(Relation *Irel, int nindexes,
> + IndexBulkDeleteResult **stats,
> + LVShared *lvshared,
> + LVDeadTuples *dead_tuples)
> +{
> + /* Increment the active worker count */
> + pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
>
> The above code is wrong because it is possible that this function is
> called even when there are no workers in which case
> VacuumActiveNWorkers will be NULL.
>
> 2.
> + /* Take over the shared balance value to heap scan */
> + VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
>
> We can carry over shared balance only if the same is active.
>
> 3.
> + if (Irel[i]->rd_indam->amparallelvacuumoptions ==
> + VACUUM_OPTION_NO_PARALLEL)
> + {
> +
> /* Set NULL as this index does not support parallel vacuum */
> + lvshared->bitmap[i >> 3] |= 0 << (i & 0x07);
>
> Can we avoid setting this for each index by initializing bitmap as all
> NULL's as is done in the attached patch?
>
> 4.
> + /*
> + * Variables to control parallel index vacuuming.  Index statistics
> + * returned from ambulkdelete and amvacuumcleanup is nullable
> variable
> + * length.  'offset' is NULL bitmap. Note that a 0 indicates a null,
> + * while 1 indicates non-null.  The index statistics follows
> at end of
> + * struct.
> + */
>
> This comment is not clear, so I have re-worded it.  See, if the
> changed comment makes sense.
>
> I have fixed all the above issues, made a couple of other cosmetic
> changes and modified a few comments.  See the changes in
> v34-0002-delta-amit.  I am attaching just the delta patch on top of
> v34-0002-Add-parallel-option-to-VACUUM-command.
>

Thank you for reviewing this patch. All changes you made looks good to me.

I thought I already have posted all v34 patches but didn't, sorry. So
I've attached v35 patch set that incorporated your changes and it
includes Dilip's patch for gist index (0001). These patches can be
applied on top of the current HEAD and make check should pass.

Thanks for the re-based patches.

On the top of v35 patch, I can see one compilation warning.
parallel.c: In function ‘LaunchParallelWorkers’:
parallel.c:502:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
  int   i;
  ^

Above warning is due to one extra semicolon added at the end of declaration line in v35-0003 patch. Please fix this in next version.
+   int         nworkers_to_launch = Min(nworkers, pcxt->nworkers);;

I will continue my testing on the top of v35 patch set and will post results.

Thanks and Regards
Mahendra Thalor

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Wed, 27 Nov 2019 at 19:21, Mahendra Singh <mahi6run@gmail.com> wrote:
>
>
> Thanks for the re-based patches.
>
> On the top of v35 patch, I can see one compilation warning.
>>
>> parallel.c: In function ‘LaunchParallelWorkers’:
>> parallel.c:502:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
>>   int   i;
>>   ^
>
>
> Above warning is due to one extra semicolon added at the end of declaration line in v35-0003 patch. Please fix this
innext version. 
> +   int         nworkers_to_launch = Min(nworkers, pcxt->nworkers);;

Thanks. I will fix it in the next version patch.

>
> I will continue my testing on the top of v35 patch set and will post results.

Thank you!

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh
Date:
On Thu, 28 Nov 2019 at 13:32, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
On Wed, 27 Nov 2019 at 19:21, Mahendra Singh <mahi6run@gmail.com> wrote:
>
>
> Thanks for the re-based patches.
>
> On the top of v35 patch, I can see one compilation warning.
>>
>> parallel.c: In function ‘LaunchParallelWorkers’:
>> parallel.c:502:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
>>   int   i;
>>   ^
>
>
> Above warning is due to one extra semicolon added at the end of declaration line in v35-0003 patch. Please fix this in next version.
> +   int         nworkers_to_launch = Min(nworkers, pcxt->nworkers);;

Thanks. I will fix it in the next version patch.

>
> I will continue my testing on the top of v35 patch set and will post results.

While reviewing v35 patch set and doing testing, I found that if we disable leader participation, then we are launching 1 less parallel worker than total number of indexes. (I am using max_parallel_workers = 20, max_parallel_maintenance_workers = 20)

For example: If table have 3 indexes and we gave 6 parallel vacuum degree(leader participation is disabled), then I think, we should launch 3 parallel workers but we are launching 2 workers due to below check.
+       nworkers = lps->nindexes_parallel_bulkdel - 1;
+
+   /* Cap by the worker we computed at the beginning of parallel lazy vacuum */
+   nworkers = Min(nworkers, lps->pcxt->nworkers);

Please let me know your thoughts for this.

Thanks and Regards
Mahendra Thalor

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Thu, Nov 28, 2019 at 4:10 PM Mahendra Singh <mahi6run@gmail.com> wrote:
>
> On Thu, 28 Nov 2019 at 13:32, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
>>
>> On Wed, 27 Nov 2019 at 19:21, Mahendra Singh <mahi6run@gmail.com> wrote:
>> >
>> >
>> > Thanks for the re-based patches.
>> >
>> > On the top of v35 patch, I can see one compilation warning.
>> >>
>> >> parallel.c: In function ‘LaunchParallelWorkers’:
>> >> parallel.c:502:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
>> >>   int   i;
>> >>   ^
>> >
>> >
>> > Above warning is due to one extra semicolon added at the end of declaration line in v35-0003 patch. Please fix
thisin next version. 
>> > +   int         nworkers_to_launch = Min(nworkers, pcxt->nworkers);;
>>
>> Thanks. I will fix it in the next version patch.
>>
>> >
>> > I will continue my testing on the top of v35 patch set and will post results.
>
>
> While reviewing v35 patch set and doing testing, I found that if we disable leader participation, then we are
launching1 less parallel worker than total number of indexes. (I am using max_parallel_workers = 20,
max_parallel_maintenance_workers= 20) 
>
> For example: If table have 3 indexes and we gave 6 parallel vacuum degree(leader participation is disabled), then I
think,we should launch 3 parallel workers but we are launching 2 workers due to below check. 
> +       nworkers = lps->nindexes_parallel_bulkdel - 1;
> +
> +   /* Cap by the worker we computed at the beginning of parallel lazy vacuum */
> +   nworkers = Min(nworkers, lps->pcxt->nworkers);
>
> Please let me know your thoughts for this.
>

I think it is probably because this part of the code doesn't consider
PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION.  I think if we want we
can change it but I am slightly nervous about the code complexity this
will bring but maybe that is fine.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Thu, 28 Nov 2019 at 11:57, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Nov 28, 2019 at 4:10 PM Mahendra Singh <mahi6run@gmail.com> wrote:
> >
> > On Thu, 28 Nov 2019 at 13:32, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
> >>
> >> On Wed, 27 Nov 2019 at 19:21, Mahendra Singh <mahi6run@gmail.com> wrote:
> >> >
> >> >
> >> > Thanks for the re-based patches.
> >> >
> >> > On the top of v35 patch, I can see one compilation warning.
> >> >>
> >> >> parallel.c: In function ‘LaunchParallelWorkers’:
> >> >> parallel.c:502:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
> >> >>   int   i;
> >> >>   ^
> >> >
> >> >
> >> > Above warning is due to one extra semicolon added at the end of declaration line in v35-0003 patch. Please fix
thisin next version. 
> >> > +   int         nworkers_to_launch = Min(nworkers, pcxt->nworkers);;
> >>
> >> Thanks. I will fix it in the next version patch.
> >>
> >> >
> >> > I will continue my testing on the top of v35 patch set and will post results.
> >
> >
> > While reviewing v35 patch set and doing testing, I found that if we disable leader participation, then we are
launching1 less parallel worker than total number of indexes. (I am using max_parallel_workers = 20,
max_parallel_maintenance_workers= 20) 
> >
> > For example: If table have 3 indexes and we gave 6 parallel vacuum degree(leader participation is disabled), then I
think,we should launch 3 parallel workers but we are launching 2 workers due to below check. 
> > +       nworkers = lps->nindexes_parallel_bulkdel - 1;
> > +
> > +   /* Cap by the worker we computed at the beginning of parallel lazy vacuum */
> > +   nworkers = Min(nworkers, lps->pcxt->nworkers);
> >
> > Please let me know your thoughts for this.

Thanks!

> I think it is probably because this part of the code doesn't consider
> PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION.  I think if we want we
> can change it but I am slightly nervous about the code complexity this
> will bring but maybe that is fine.

Right. I'll try to change so that.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Fri, Nov 29, 2019 at 7:11 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Thu, 28 Nov 2019 at 11:57, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
>
> > I think it is probably because this part of the code doesn't consider
> > PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION.  I think if we want we
> > can change it but I am slightly nervous about the code complexity this
> > will bring but maybe that is fine.
>
> Right. I'll try to change so that.
>

I am thinking that as PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION is
a debugging/testing facility, we should ideally separate this out from
the main patch.  BTW, I am hacking/reviewing the patch further, so
request you to wait for a few day's time before we do anything in this
regard.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Sergei Kornilov
Date:
Hello

Its possible to change order of index processing by parallel leader? In v35 patchset I see following order:
- start parallel processes
- leader and parallel workers processed index lixt and possible skip some entries
- after that parallel leader recheck index list and process the skipped indexes
- WaitForParallelWorkersToFinish

I think it would be better to:
- start parallel processes
- parallel leader goes through index list and process only indexes which are skip_parallel_index_vacuum = true
- parallel workers processes indexes with skip_parallel_index_vacuum = false
- parallel leader start participate with remainings parallel-safe index processing
- WaitForParallelWorkersToFinish

This would be less running time and better load balance across leader and workers in case of few non-parallel and few
parallelindexes.
 
(if this is expected and required by some reason, we need a comment in code)

Also few notes to vacuumdb:
Seems we need version check at least in vacuum_one_database and prepare_vacuum_command. Similar to SKIP_LOCKED or
DISABLE_PAGE_SKIPPINGfeatures.
 
discussion question: difference between --parallel and --jobs parameters will be confusing? We need more description
forthis options?
 

regards, Sergei



Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh
Date:
On Sat, 30 Nov 2019 at 19:18, Sergei Kornilov <sk@zsrv.org> wrote:
Hello

Its possible to change order of index processing by parallel leader? In v35 patchset I see following order:
- start parallel processes
- leader and parallel workers processed index lixt and possible skip some entries
- after that parallel leader recheck index list and process the skipped indexes
- WaitForParallelWorkersToFinish

I think it would be better to:
- start parallel processes
- parallel leader goes through index list and process only indexes which are skip_parallel_index_vacuum = true
- parallel workers processes indexes with skip_parallel_index_vacuum = false
- parallel leader start participate with remainings parallel-safe index processing
- WaitForParallelWorkersToFinish

This would be less running time and better load balance across leader and workers in case of few non-parallel and few parallel indexes.
(if this is expected and required by some reason, we need a comment in code)

Also few notes to vacuumdb:
Seems we need version check at least in vacuum_one_database and prepare_vacuum_command. Similar to SKIP_LOCKED or DISABLE_PAGE_SKIPPING features.
discussion question: difference between --parallel and --jobs parameters will be confusing? We need more description for this options

While doing testing with different server configuration settings, I am getting error (ERROR:  no unpinned buffers available) in parallel vacuum but normal vacuum is working fine.

Test Setup:
max_worker_processes = 40                                                      
autovacuum = off                                                                                                                    
shared_buffers = 128kB                                                          
max_parallel_workers = 40                                                      
max_parallel_maintenance_workers = 40                                          
vacuum_cost_limit = 2000                                                        
vacuum_cost_delay = 10                                                          

Table description: table have 16 indexes(14 btree, 1 hash, 1 BRIN ) and total 10,00,000 tuples and I am deleting all the tuples, then firing vacuum command.
Run attached .sql file (test_16_indexes.sql)
$ ./psql postgres
postgres=# \i test_16_indexes.sql

Re-start the server and do vacuum.
Case 1) normal vacuum:
postgres=# vacuum test ;
VACUUM
Time: 115174.470 ms (01:55.174)

Case 2) parallel vacuum using 10 parallel workers:
postgres=# vacuum (parallel 10)test ;
ERROR:  no unpinned buffers available
CONTEXT:  parallel worker
postgres=#

This error is coming due to 128kB shared buffer. I think, I launched 10 parallel workers and all are working paralleling so due to less shared buffer, I am getting this error.

Is this expected behavior with small shared buffer size or we should try to come with a solution for this.  Please let me know your thoughts.

Thanks and Regards
Mahendra Thalor
Attachment

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Sat, Nov 30, 2019 at 7:18 PM Sergei Kornilov <sk@zsrv.org> wrote:
Hello

Its possible to change order of index processing by parallel leader? In v35 patchset I see following order:
- start parallel processes
- leader and parallel workers processed index lixt and possible skip some entries
- after that parallel leader recheck index list and process the skipped indexes
- WaitForParallelWorkersToFinish

I think it would be better to:
- start parallel processes
- parallel leader goes through index list and process only indexes which are skip_parallel_index_vacuum = true
- parallel workers processes indexes with skip_parallel_index_vacuum = false
- parallel leader start participate with remainings parallel-safe index processing
- WaitForParallelWorkersToFinish

This would be less running time and better load balance across leader and workers in case of few non-parallel and few parallel indexes.

Why do you think so?  I think the advantage of the current approach is that once the parallel workers are launched, the leader can process indexes that don't support parallelism.  So, both type of indexes can be processed at the same time.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

From
Sergei Kornilov
Date:
Hi

> I think the advantage of the current approach is that once the parallel workers are launched, the leader can process
indexesthat don't support parallelism.  So, both type of indexes can be processed at the same time.
 

In lazy_parallel_vacuum_or_cleanup_indexes I see:

    /*
      * Join as a parallel worker. The leader process alone does that in
     * case where no workers launched.
     */
    if (lps->leaderparticipates || lps->pcxt->nworkers_launched == 0)
        vacuum_or_cleanup_indexes_worker(Irel, nindexes, stats, lps->lvshared,
                                         vacrelstats->dead_tuples);

    /*
     * Here, the indexes that had been skipped during parallel index vacuuming
     * are remaining. If there are such indexes the leader process does vacuum
     * or cleanup them one by one.
     */
    vacuum_or_cleanup_skipped_indexes(vacrelstats, Irel, nindexes, stats,
                                      lps);

So parallel leader will process parallel indexes first along with parallel workers and skip non-parallel ones. Only
afterend of the index list parallel leader will process non-parallel indexes one by one. In case of equal index
processingtime parallel leader will process (count of parallel indexes)/(nworkers+1) + all non-parallel, while parallel
workerswill process (count of parallel indexes)/(nworkers+1).  I am wrong here?
 

regards, Sergei



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Sun, 1 Dec 2019 at 11:06, Sergei Kornilov <sk@zsrv.org> wrote:
>
> Hi
>
> > I think the advantage of the current approach is that once the parallel workers are launched, the leader can
processindexes that don't support parallelism.  So, both type of indexes can be processed at the same time. 
>
> In lazy_parallel_vacuum_or_cleanup_indexes I see:
>
>         /*
>          * Join as a parallel worker. The leader process alone does that in
>          * case where no workers launched.
>          */
>         if (lps->leaderparticipates || lps->pcxt->nworkers_launched == 0)
>                 vacuum_or_cleanup_indexes_worker(Irel, nindexes, stats, lps->lvshared,
>                                                                                  vacrelstats->dead_tuples);
>
>         /*
>          * Here, the indexes that had been skipped during parallel index vacuuming
>          * are remaining. If there are such indexes the leader process does vacuum
>          * or cleanup them one by one.
>          */
>         vacuum_or_cleanup_skipped_indexes(vacrelstats, Irel, nindexes, stats,
>                                                                           lps);
>
> So parallel leader will process parallel indexes first along with parallel workers and skip non-parallel ones. Only
afterend of the index list parallel leader will process non-parallel indexes one by one. In case of equal index
processingtime parallel leader will process (count of parallel indexes)/(nworkers+1) + all non-parallel, while parallel
workerswill process (count of parallel indexes)/(nworkers+1).  I am wrong here? 
>

I think I got your point. Your proposal is that it's more efficient if
we make the leader process vacuum the index that can be processed only
the leader process (i.e. indexes not supporting parallel index vacuum)
while workers are processing indexes supporting parallel index vacuum,
right? That way, we can process indexes in parallel as much as
possible. So maybe we can call vacuum_or_cleanup_skipped_indexes first
and then call vacuum_or_cleanup_indexes_worker. But I'm not sure that
there are parallel-safe remaining indexes after the leader finished
vacuum_or_cleanup_indexes_worker, as described on your proposal.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Sat, 30 Nov 2019 at 04:06, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Nov 29, 2019 at 7:11 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Thu, 28 Nov 2019 at 11:57, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> >
> > > I think it is probably because this part of the code doesn't consider
> > > PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION.  I think if we want we
> > > can change it but I am slightly nervous about the code complexity this
> > > will bring but maybe that is fine.
> >
> > Right. I'll try to change so that.
> >
>
> I am thinking that as PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION is
> a debugging/testing facility, we should ideally separate this out from
> the main patch.  BTW, I am hacking/reviewing the patch further, so
> request you to wait for a few day's time before we do anything in this
> regard.

Sure, thank you so much. I'll wait for your comments and reviewing.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Sat, 30 Nov 2019 at 22:11, Mahendra Singh <mahi6run@gmail.com> wrote:
>
> On Sat, 30 Nov 2019 at 19:18, Sergei Kornilov <sk@zsrv.org> wrote:
>>
>> Hello
>>
>> Its possible to change order of index processing by parallel leader? In v35 patchset I see following order:
>> - start parallel processes
>> - leader and parallel workers processed index lixt and possible skip some entries
>> - after that parallel leader recheck index list and process the skipped indexes
>> - WaitForParallelWorkersToFinish
>>
>> I think it would be better to:
>> - start parallel processes
>> - parallel leader goes through index list and process only indexes which are skip_parallel_index_vacuum = true
>> - parallel workers processes indexes with skip_parallel_index_vacuum = false
>> - parallel leader start participate with remainings parallel-safe index processing
>> - WaitForParallelWorkersToFinish
>>
>> This would be less running time and better load balance across leader and workers in case of few non-parallel and
fewparallel indexes.
 
>> (if this is expected and required by some reason, we need a comment in code)
>>
>> Also few notes to vacuumdb:
>> Seems we need version check at least in vacuum_one_database and prepare_vacuum_command. Similar to SKIP_LOCKED or
DISABLE_PAGE_SKIPPINGfeatures.
 
>> discussion question: difference between --parallel and --jobs parameters will be confusing? We need more description
forthis options
 
>
>
> While doing testing with different server configuration settings, I am getting error (ERROR:  no unpinned buffers
available)in parallel vacuum but normal vacuum is working fine.
 
>
> Test Setup:
> max_worker_processes = 40
> autovacuum = off
> shared_buffers = 128kB
> max_parallel_workers = 40
> max_parallel_maintenance_workers = 40
> vacuum_cost_limit = 2000
> vacuum_cost_delay = 10
>
> Table description: table have 16 indexes(14 btree, 1 hash, 1 BRIN ) and total 10,00,000 tuples and I am deleting all
thetuples, then firing vacuum command.
 
> Run attached .sql file (test_16_indexes.sql)
> $ ./psql postgres
> postgres=# \i test_16_indexes.sql
>
> Re-start the server and do vacuum.
> Case 1) normal vacuum:
> postgres=# vacuum test ;
> VACUUM
> Time: 115174.470 ms (01:55.174)
>
> Case 2) parallel vacuum using 10 parallel workers:
> postgres=# vacuum (parallel 10)test ;
> ERROR:  no unpinned buffers available
> CONTEXT:  parallel worker
> postgres=#
>
> This error is coming due to 128kB shared buffer. I think, I launched 10 parallel workers and all are working
parallelingso due to less shared buffer, I am getting this error.
 
>

Thank you for testing!

> Is this expected behavior with small shared buffer size or we should try to come with a solution for this.  Please
letme know your thoughts.
 

I think it's normal behavior when the shared buffer is not enough.
Since the total 10 processes were processing different pages at the
same time and you set a small value to shared_buffers the shared
buffer gets full easily. And you got the proper error. So I think in
this case we should consider either to increase the shared buffer size
or to decrease the parallel degree. I guess you can get this error
even when you vacuum 10 different tables concurrently instead.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Sergei Kornilov
Date:
Hi

> I think I got your point. Your proposal is that it's more efficient if
> we make the leader process vacuum the index that can be processed only
> the leader process (i.e. indexes not supporting parallel index vacuum)
> while workers are processing indexes supporting parallel index vacuum,
> right? That way, we can process indexes in parallel as much as
> possible.

Right

> So maybe we can call vacuum_or_cleanup_skipped_indexes first
> and then call vacuum_or_cleanup_indexes_worker. But I'm not sure that
> there are parallel-safe remaining indexes after the leader finished
> vacuum_or_cleanup_indexes_worker, as described on your proposal.

I meant that after processing missing indexes (not supporting parallel index vacuum), the leader can start processing
indexesthat support the parallel index vacuum, along with parallel workers.
 
Exactly call vacuum_or_cleanup_skipped_indexes after start parallel workers but before vacuum_or_cleanup_indexes_worker
orsomething with similar effect.
 
If we have 0 missed indexes - parallel vacuum will run as in current implementation, with leader participation.

Sorry for my unclear english...

regards, Sergei



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Sun, Dec 1, 2019 at 11:01 PM Sergei Kornilov <sk@zsrv.org> wrote:
>
> Hi
>
> > I think I got your point. Your proposal is that it's more efficient if
> > we make the leader process vacuum the index that can be processed only
> > the leader process (i.e. indexes not supporting parallel index vacuum)
> > while workers are processing indexes supporting parallel index vacuum,
> > right? That way, we can process indexes in parallel as much as
> > possible.
>
> Right
>
> > So maybe we can call vacuum_or_cleanup_skipped_indexes first
> > and then call vacuum_or_cleanup_indexes_worker. But I'm not sure that
> > there are parallel-safe remaining indexes after the leader finished
> > vacuum_or_cleanup_indexes_worker, as described on your proposal.
>
> I meant that after processing missing indexes (not supporting parallel index vacuum), the leader can start processing
indexesthat support the parallel index vacuum, along with parallel workers.
 
> Exactly call vacuum_or_cleanup_skipped_indexes after start parallel workers but before
vacuum_or_cleanup_indexes_workeror something with similar effect.
 
> If we have 0 missed indexes - parallel vacuum will run as in current implementation, with leader participation.

+1

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Sun, Dec 1, 2019 at 11:01 PM Sergei Kornilov <sk@zsrv.org> wrote:
Hi

> I think I got your point. Your proposal is that it's more efficient if
> we make the leader process vacuum the index that can be processed only
> the leader process (i.e. indexes not supporting parallel index vacuum)
> while workers are processing indexes supporting parallel index vacuum,
> right? That way, we can process indexes in parallel as much as
> possible.

Right

> So maybe we can call vacuum_or_cleanup_skipped_indexes first
> and then call vacuum_or_cleanup_indexes_worker. But I'm not sure that
> there are parallel-safe remaining indexes after the leader finished
> vacuum_or_cleanup_indexes_worker, as described on your proposal.

I meant that after processing missing indexes (not supporting parallel index vacuum), the leader can start processing indexes that support the parallel index vacuum, along with parallel workers.

Your idea is good, but remember we have always considered a leader as one worker if the leader can participate.  If we do what you are suggesting that won't be completely true as a leader will not completely participate in a parallel vacuum.  It might be that we don't consider leader equivalent to one worker in the presence of indexes that don't support a parallel vacuum, but I am not sure if that really matters much.  I think overall it should not matter much because we won't have that many indexes that don't support a parallel vacuum. 


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Sun, 1 Dec 2019 at 18:31, Sergei Kornilov <sk@zsrv.org> wrote:
>
> Hi
>
> > I think I got your point. Your proposal is that it's more efficient if
> > we make the leader process vacuum the index that can be processed only
> > the leader process (i.e. indexes not supporting parallel index vacuum)
> > while workers are processing indexes supporting parallel index vacuum,
> > right? That way, we can process indexes in parallel as much as
> > possible.
>
> Right
>
> > So maybe we can call vacuum_or_cleanup_skipped_indexes first
> > and then call vacuum_or_cleanup_indexes_worker. But I'm not sure that
> > there are parallel-safe remaining indexes after the leader finished
> > vacuum_or_cleanup_indexes_worker, as described on your proposal.
>
> I meant that after processing missing indexes (not supporting parallel index vacuum), the leader can start processing
indexesthat support the parallel index vacuum, along with parallel workers.
 
> Exactly call vacuum_or_cleanup_skipped_indexes after start parallel workers but before
vacuum_or_cleanup_indexes_workeror something with similar effect.
 
> If we have 0 missed indexes - parallel vacuum will run as in current implementation, with leader participation.

I think your idea might not work well in some cases. That is, I think
there are some cases where it's better if leader participates to
parallel vacuum as a worker as soon as possible especially if a table
has many indexes that designedly don't support parallel vacuum (e.g.
bulkdelete of brin and using VACUUM_OPTION_PARALLEL_COND_CLEANUP).
Suppose the table has both 3 indexes that support parallel vacuum and
takes time 5 sec, 10 sec and 10 sec to vacuum respectively and 3
indexes that don't support and takes 2 sec for each. In current patch
we launch 2 workers. Then they take two indexes to vacuum and will
take 5 sec and 10 sec. At the same time the leader processes 3 indexes
that don't support parallel index and takes 6 sec. Therefore after the
worker finishes its index it takes the next index and takes 10 sec
more. The total execution time will be 15 sec. On the other hand, if
the leader participated to parallel vacuum first the total execution
time can be 11 sec (taking 5 sec and 2 sec * 3).

It's just an example, I'm not saying your idea is bad. ISTM the idea
is good on an assumption that all indexes take the same time or take a
long time so I'd also like to consider if this is true even in
production and which approaches is better if we don't have such
assumption.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
tushar
Date:
On 11/27/19 11:13 PM, Masahiko Sawada wrote:
> Thank you for reviewing this patch. All changes you made looks good to me.
>
> I thought I already have posted all v34 patches but didn't, sorry. So
> I've attached v35 patch set that incorporated your changes and it
> includes Dilip's patch for gist index (0001). These patches can be
> applied on top of the current HEAD and make check should pass.
> Regards,
While doing testing of this feature against v35- patches ( minus 004) on 
Master ,
getting crash when user connect to server using single mode and try to 
perform vacuum (parallel 1 ) o/p

tushar@localhost bin]$ ./postgres --single -D data/  postgres
2019-12-03 12:49:26.967 +0530 [70300] LOG:  database system was 
interrupted; last known up at 2019-12-03 12:48:51 +0530
2019-12-03 12:49:26.987 +0530 [70300] LOG:  database system was not 
properly shut down; automatic recovery in progress
2019-12-03 12:49:26.990 +0530 [70300] LOG:  invalid record length at 
0/29F1638: wanted 24, got 0
2019-12-03 12:49:26.990 +0530 [70300] LOG:  redo is not required

PostgreSQL stand-alone backend 13devel
backend>
backend> vacuum full;
backend> vacuum (parallel 1);
TRAP: FailedAssertion("IsUnderPostmaster", File: "dsm.c", Line: 444)
./postgres(ExceptionalCondition+0x53)[0x8c6fa3]
./postgres[0x785ced]
./postgres(GetSessionDsmHandle+0xca)[0x49304a]
./postgres(InitializeParallelDSM+0x74)[0x519d64]
./postgres(heap_vacuum_rel+0x18d3)[0x4e47e3]
./postgres[0x631d9a]
./postgres(vacuum+0x444)[0x632f14]
./postgres(ExecVacuum+0x2bb)[0x63369b]
./postgres(standard_ProcessUtility+0x4cf)[0x7b312f]
./postgres[0x7b02c6]
./postgres[0x7b0dd3]
./postgres(PortalRun+0x162)[0x7b1b02]
./postgres[0x7ad874]
./postgres(PostgresMain+0x1002)[0x7aebf2]
./postgres(main+0x1ce)[0x48188e]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f4fe6908505]
./postgres[0x481b6a]
Aborted (core dumped)

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Tue, Dec 3, 2019 at 12:55 PM tushar <tushar.ahuja@enterprisedb.com> wrote:
On 11/27/19 11:13 PM, Masahiko Sawada wrote:
> Thank you for reviewing this patch. All changes you made looks good to me.
>
> I thought I already have posted all v34 patches but didn't, sorry. So
> I've attached v35 patch set that incorporated your changes and it
> includes Dilip's patch for gist index (0001). These patches can be
> applied on top of the current HEAD and make check should pass.
> Regards,
While doing testing of this feature against v35- patches ( minus 004) on
Master ,

Thanks for doing the testing of these patches.
 
getting crash when user connect to server using single mode and try to
perform vacuum (parallel 1 ) o/p

tushar@localhost bin]$ ./postgres --single -D data/  postgres
2019-12-03 12:49:26.967 +0530 [70300] LOG:  database system was
interrupted; last known up at 2019-12-03 12:48:51 +0530
2019-12-03 12:49:26.987 +0530 [70300] LOG:  database system was not
properly shut down; automatic recovery in progress
2019-12-03 12:49:26.990 +0530 [70300] LOG:  invalid record length at
0/29F1638: wanted 24, got 0
2019-12-03 12:49:26.990 +0530 [70300] LOG:  redo is not required

PostgreSQL stand-alone backend 13devel
backend>
backend> vacuum full;
backend> vacuum (parallel 1);

The parallel vacuum shouldn't be allowed via standalone backends as we can't create DSM segments in that mode and similar is true for the parallel query.  It should internally proceed with a serial vacuum.  I'll fix it in the next version I am planning to post.  BTW, it seems that the same problem will be there for parallel create index.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Tue, Dec 3, 2019 at 12:56 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
On Sun, 1 Dec 2019 at 18:31, Sergei Kornilov <sk@zsrv.org> wrote:
>
> Hi
>
> > I think I got your point. Your proposal is that it's more efficient if
> > we make the leader process vacuum the index that can be processed only
> > the leader process (i.e. indexes not supporting parallel index vacuum)
> > while workers are processing indexes supporting parallel index vacuum,
> > right? That way, we can process indexes in parallel as much as
> > possible.
>
> Right
>
> > So maybe we can call vacuum_or_cleanup_skipped_indexes first
> > and then call vacuum_or_cleanup_indexes_worker. But I'm not sure that
> > there are parallel-safe remaining indexes after the leader finished
> > vacuum_or_cleanup_indexes_worker, as described on your proposal.
>
> I meant that after processing missing indexes (not supporting parallel index vacuum), the leader can start processing indexes that support the parallel index vacuum, along with parallel workers.
> Exactly call vacuum_or_cleanup_skipped_indexes after start parallel workers but before vacuum_or_cleanup_indexes_worker or something with similar effect.
> If we have 0 missed indexes - parallel vacuum will run as in current implementation, with leader participation.

I think your idea might not work well in some cases.

Good point.  I am also not sure whether it is a good idea to make the suggested change, but I think adding a comment on those lines is not a bad idea which I have done in the attached patch.

I have made some other changes as well.
1. 
+ if (VacuumSharedCostBalance != NULL)
  {
- double msec;
+ int nworkers = pg_atomic_read_u32
(VacuumActiveNWorkers);
+
+ /* At least count itself */
+ Assert(nworkers >= 1);
+
+ /* Update the shared cost
balance value atomically */
+ while (true)
+ {
+ uint32 shared_balance;
+ uint32 new_balance;
+
uint32 local_balance;
+
+ msec = 0;
+
+ /* compute new balance by adding the local value */
+
shared_balance = pg_atomic_read_u32(VacuumSharedCostBalance);
+ new_balance = shared_balance + VacuumCostBalance;
+
/* also compute the total local balance */
+ local_balance = VacuumCostBalanceLocal + VacuumCostBalance;
+
+
if ((new_balance >= VacuumCostLimit) &&
+ (local_balance > 0.5 * (VacuumCostLimit / nworkers)))
+ {
+
/* compute sleep time based on the local cost balance */
+ msec = VacuumCostDelay *
VacuumCostBalanceLocal / VacuumCostLimit;
+ new_balance = shared_balance - VacuumCostBalanceLocal;
+
VacuumCostBalanceLocal = 0;
+ }
+
+ if (pg_atomic_compare_exchange_u32(VacuumSharedCostBalance,
+
  &shared_balance,
+
  new_balance))
+ {
+ /* Updated successfully, break */
+
break;
+ }
+ }
+
+ VacuumCostBalanceLocal += VacuumCostBalance;

I see multiple problems with this code. (a) if the VacuumSharedCostBalance is changed by the time of compare and exchange, then the next iteration might not compute the correct values as you might have reset VacuumCostBalanceLocal by that time. (b) In code line, new_balance = shared_balance - VacuumCostBalanceLocal, you need to use new_balance instead of shared_balance, otherwise, it won't account for the balance of the latest cycle.  (c) In code line, msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit;, I think you need to use local_balance for reasons similar to (b). (d) I think we can write this code with a lesser number of variables.

I have fixed all these problems and used a slightly different way to compute the parallel delay.  See compute_parallel_delay() in the attached delta patch.

2.
+ /* Setup the shared cost-based vacuum delay and launch workers*/
+ if (nworkers > 0)
+ {
+ /*
+ * Reset the local value so that we compute cost balance during
+ * parallel index vacuuming.
+ */
+ VacuumCostBalance = 0;
+ VacuumCostBalanceLocal = 0;
+
+ LaunchParallelWorkers(lps->pcxt, nworkers);
+
+ /* Enable shared costing iff we process indexes in parallel. */
+ if (lps->pcxt->nworkers_launched > 0)
+ {
+ /* Enable shared cost balance */
+ VacuumSharedCostBalance = &(lps->lvshared->cost_balance);
+ VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
+
+ /*
+ * Set up shared cost balance and the number of active workers for
+ * vacuum delay.
+ */
+ pg_atomic_write_u32(VacuumSharedCostBalance, VacuumCostBalance);
+ pg_atomic_write_u32(VacuumActiveNWorkers, 0);

This code has issues.  We can't initialize VacuumSharedCostBalance/VacuumActiveNWorkers after launching workers as by that time some other worker would have changed its value.  This has been reported offlist by Mahendra and I have fixed it.

3. Changed the name of functions which were too long and I think new names are more meaningful.  If you don't agree with these changes, then we can discuss it.

4. Changed the order of parameters in many functions to match with existing code. 

5. Refactored the code at a few places so that it can be easy to follow.

6. Added/Edited many comments and other cosmetic changes.

You can find all these changes in v35-0003-Code-review-amit.patch.

Few other things, I would like you to consider.
1.  I think disable_parallel_leader_participation related code can be extracted into a separate patch as it is mainly a debug/test aid.  You can also fix the problem reported by Mahendra in that context.

2. I think if we cam somehow disallow very small indexes to use parallel workers, then it will be better.   Can we use  min_parallel_index_scan_size to decide whether a particular index can participate in a parallel vacuum?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachment

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Tue, Dec 3, 2019 at 4:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Few other things, I would like you to consider.
1.  I think disable_parallel_leader_participation related code can be extracted into a separate patch as it is mainly a debug/test aid.  You can also fix the problem reported by Mahendra in that context.

2. I think if we cam somehow disallow very small indexes to use parallel workers, then it will be better.   Can we use  min_parallel_index_scan_size to decide whether a particular index can participate in a parallel vacuum?

Forgot one minor point.  Please run pgindent on all the patches.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh
Date:
On Tue, 3 Dec 2019 at 16:27, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Dec 3, 2019 at 4:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Few other things, I would like you to consider.
1.  I think disable_parallel_leader_participation related code can be extracted into a separate patch as it is mainly a debug/test aid.  You can also fix the problem reported by Mahendra in that context.

2. I think if we cam somehow disallow very small indexes to use parallel workers, then it will be better.   Can we use  min_parallel_index_scan_size to decide whether a particular index can participate in a parallel vacuum?

Forgot one minor point.  Please run pgindent on all the patches.

While reviewing and testing v35 patch set, I noticed some problems. Below are some comments:

1.
  /*
+ * Since parallel workers cannot access data in temporary tables, parallel
+ * vacuum is not allowed for temporary relation. However rather than
+ * skipping vacuum on the table, just disabling parallel option is better
+ * option in most cases.
+ */
+ if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0)
+ {
+ ereport(WARNING,
+ (errmsg("disabling parallel option of vacuum on \"%s\" --- cannot vacuum temporary tables in parallel",
+ RelationGetRelationName(onerel))));
+ params->nworkers = 0;
+ }

Here, I think, we should set params->nworkers = -1 to disable parallel vacuum for temporary tables. I noticed that even after warning, we were doing vacuum in parallel mode and were launching parallel workers that was wrong.

2.
Amit suggested me to check time taken by vacuum.sql regression test.

vacuum                       ... ok        20684 ms      -------on the top of v35 patch set
vacuum                       ... ok         1020 ms   -------without v35 patch set

Here, we can see that time taken by vacuum test case is increased too much due to parallel vacuum test cases so I will try to come with a small test case.

Thanks and Regards
Mahendra Thalor

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, 3 Dec 2019 at 11:55, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Dec 3, 2019 at 12:56 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
>>
>> On Sun, 1 Dec 2019 at 18:31, Sergei Kornilov <sk@zsrv.org> wrote:
>> >
>> > Hi
>> >
>> > > I think I got your point. Your proposal is that it's more efficient if
>> > > we make the leader process vacuum the index that can be processed only
>> > > the leader process (i.e. indexes not supporting parallel index vacuum)
>> > > while workers are processing indexes supporting parallel index vacuum,
>> > > right? That way, we can process indexes in parallel as much as
>> > > possible.
>> >
>> > Right
>> >
>> > > So maybe we can call vacuum_or_cleanup_skipped_indexes first
>> > > and then call vacuum_or_cleanup_indexes_worker. But I'm not sure that
>> > > there are parallel-safe remaining indexes after the leader finished
>> > > vacuum_or_cleanup_indexes_worker, as described on your proposal.
>> >
>> > I meant that after processing missing indexes (not supporting parallel index vacuum), the leader can start
processingindexes that support the parallel index vacuum, along with parallel workers. 
>> > Exactly call vacuum_or_cleanup_skipped_indexes after start parallel workers but before
vacuum_or_cleanup_indexes_workeror something with similar effect. 
>> > If we have 0 missed indexes - parallel vacuum will run as in current implementation, with leader participation.
>>
>> I think your idea might not work well in some cases.
>
>
> Good point.  I am also not sure whether it is a good idea to make the suggested change, but I think adding a comment
onthose lines is not a bad idea which I have done in the attached patch. 

Thank you for updating the patch!

>
> I have made some other changes as well.
> 1.
> + if (VacuumSharedCostBalance != NULL)
>   {
> - double msec;
> + int nworkers = pg_atomic_read_u32
> (VacuumActiveNWorkers);
> +
> + /* At least count itself */
> + Assert(nworkers >= 1);
> +
> + /* Update the shared cost
> balance value atomically */
> + while (true)
> + {
> + uint32 shared_balance;
> + uint32 new_balance;
> +
> uint32 local_balance;
> +
> + msec = 0;
> +
> + /* compute new balance by adding the local value */
> +
> shared_balance = pg_atomic_read_u32(VacuumSharedCostBalance);
> + new_balance = shared_balance + VacuumCostBalance;
> +
> /* also compute the total local balance */
> + local_balance = VacuumCostBalanceLocal + VacuumCostBalance;
> +
> +
> if ((new_balance >= VacuumCostLimit) &&
> + (local_balance > 0.5 * (VacuumCostLimit / nworkers)))
> + {
> +
> /* compute sleep time based on the local cost balance */
> + msec = VacuumCostDelay *
> VacuumCostBalanceLocal / VacuumCostLimit;
> + new_balance = shared_balance - VacuumCostBalanceLocal;
> +
> VacuumCostBalanceLocal = 0;
> + }
> +
> + if (pg_atomic_compare_exchange_u32(VacuumSharedCostBalance,
> +
>   &shared_balance,
> +
>   new_balance))
> + {
> + /* Updated successfully, break */
> +
> break;
> + }
> + }
> +
> + VacuumCostBalanceLocal += VacuumCostBalance;
>
> I see multiple problems with this code. (a) if the VacuumSharedCostBalance is changed by the time of compare and
exchange,then the next iteration might not compute the correct values as you might have reset VacuumCostBalanceLocal by
thattime. (b) In code line, new_balance = shared_balance - VacuumCostBalanceLocal, you need to use new_balance instead
ofshared_balance, otherwise, it won't account for the balance of the latest cycle.  (c) In code line, msec =
VacuumCostDelay* VacuumCostBalanceLocal / VacuumCostLimit;, I think you need to use local_balance for reasons similar
to(b). (d) I think we can write this code with a lesser number of variables. 

In your code, I think if two workers enter to compute_parallel_delay
function at the same time, they add their local balance to
VacuumSharedCostBalance and both workers sleep because both values
reach the VacuumCostLimit. But either one worker should not sleep in
this case.

>
> I have fixed all these problems and used a slightly different way to compute the parallel delay.  See
compute_parallel_delay()in the attached delta patch. 
>
> 2.
> + /* Setup the shared cost-based vacuum delay and launch workers*/
> + if (nworkers > 0)
> + {
> + /*
> + * Reset the local value so that we compute cost balance during
> + * parallel index vacuuming.
> + */
> + VacuumCostBalance = 0;
> + VacuumCostBalanceLocal = 0;
> +
> + LaunchParallelWorkers(lps->pcxt, nworkers);
> +
> + /* Enable shared costing iff we process indexes in parallel. */
> + if (lps->pcxt->nworkers_launched > 0)
> + {
> + /* Enable shared cost balance */
> + VacuumSharedCostBalance = &(lps->lvshared->cost_balance);
> + VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
> +
> + /*
> + * Set up shared cost balance and the number of active workers for
> + * vacuum delay.
> + */
> + pg_atomic_write_u32(VacuumSharedCostBalance, VacuumCostBalance);
> + pg_atomic_write_u32(VacuumActiveNWorkers, 0);
>
> This code has issues.  We can't initialize VacuumSharedCostBalance/VacuumActiveNWorkers after launching workers as by
thattime some other worker would have changed its value.  This has been reported offlist by Mahendra and I have fixed
it.
>
> 3. Changed the name of functions which were too long and I think new names are more meaningful.  If you don't agree
withthese changes, then we can discuss it. 
>
> 4. Changed the order of parameters in many functions to match with existing code.
>
> 5. Refactored the code at a few places so that it can be easy to follow.
>
> 6. Added/Edited many comments and other cosmetic changes.
>
> You can find all these changes in v35-0003-Code-review-amit.patch.

I've confirmed these changes and these look good to me.

> Few other things, I would like you to consider.
> 1.  I think disable_parallel_leader_participation related code can be extracted into a separate patch as it is mainly
adebug/test aid.  You can also fix the problem reported by Mahendra in that context. 

Agreed. I'll create a patch for disable_parallel_leader_participation.

> 2. I think if we cam somehow disallow very small indexes to use parallel workers, then it will be better.   Can we
use min_parallel_index_scan_size to decide whether a particular index can participate in a parallel vacuum? 

I think it's a good idea but I'm concerned that the default value of
min_parallel_index_scan_size, 512kB, is too small for parallel vacuum
purpose. Given that people who want to use parallel vacuum are likely
to have a big table the indexes that can be skipped by the default
value would be only brin indexes, I think. Also I guess that the
reason why the default value is small is that
min_parallel_index_scan_size compares to the number of blocks being
scanned during index scan, not whole index. On the other hand in
parallel vacuum we will compare it to the whole index blocks because
the index vacuuming is always full scan. So I'm also concerned that
user will get confused about reasonable setting.

As another idea how about using min_parallel_table_scan_size instead?
That is, we cannot do parallel vacuum on the table smaller than that
value. I think this idea had already been proposed once in this thread
but now I think it's also a good idea.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, 3 Dec 2019 at 11:57, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Dec 3, 2019 at 4:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> Few other things, I would like you to consider.
>> 1.  I think disable_parallel_leader_participation related code can be extracted into a separate patch as it is
mainlya debug/test aid.  You can also fix the problem reported by Mahendra in that context.
 
>>
>> 2. I think if we cam somehow disallow very small indexes to use parallel workers, then it will be better.   Can we
use min_parallel_index_scan_size to decide whether a particular index can participate in a parallel vacuum?
 
>
>
> Forgot one minor point.  Please run pgindent on all the patches.

Got it. I will run pgindent before sending patch from next time.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Wed, Dec 4, 2019 at 1:58 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Tue, 3 Dec 2019 at 11:55, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> In your code, I think if two workers enter to compute_parallel_delay
> function at the same time, they add their local balance to
> VacuumSharedCostBalance and both workers sleep because both values
> reach the VacuumCostLimit.
>

True, but isn't it more appropriate because the local cost of any
worker should be ideally added to shared cost as soon as it occurred?
I mean to say that we are not adding any cost in shared balance
without actually incurring it.   Then we also consider the individual
worker's local balance as well and sleep according to local balance.

>
> > 2. I think if we cam somehow disallow very small indexes to use parallel workers, then it will be better.   Can we
use min_parallel_index_scan_size to decide whether a particular index can participate in a parallel vacuum?
 
>
> I think it's a good idea but I'm concerned that the default value of
> min_parallel_index_scan_size, 512kB, is too small for parallel vacuum
> purpose. Given that people who want to use parallel vacuum are likely
> to have a big table the indexes that can be skipped by the default
> value would be only brin indexes, I think.
>

Yeah or probably hash indexes in some cases.

> Also I guess that the
> reason why the default value is small is that
> min_parallel_index_scan_size compares to the number of blocks being
> scanned during index scan, not whole index. On the other hand in
> parallel vacuum we will compare it to the whole index blocks because
> the index vacuuming is always full scan. So I'm also concerned that
> user will get confused about reasonable setting.
>

This setting is about how much of index we are going to scan, so I am
not sure if it matters whether it is part or full index scan.  Also,
in an index scan, we will launch multiple workers to scan that index
and here we will consider launching just one worker.

> As another idea how about using min_parallel_table_scan_size instead?
>

Hmm, yeah, that can be another option, but it might not be a good idea
for partial indexes.

> That is, we cannot do parallel vacuum on the table smaller than that
> value.
>

Yeah, that makes sense, but I feel if we can directly target index
scan size that may be a better option.  If we can't use
min_parallel_index_scan_size, then we can consider this.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Wed, Dec 4, 2019 at 9:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Dec 4, 2019 at 1:58 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Tue, 3 Dec 2019 at 11:55, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > In your code, I think if two workers enter to compute_parallel_delay
> > function at the same time, they add their local balance to
> > VacuumSharedCostBalance and both workers sleep because both values
> > reach the VacuumCostLimit.
> >
>
> True, but isn't it more appropriate because the local cost of any
> worker should be ideally added to shared cost as soon as it occurred?
> I mean to say that we are not adding any cost in shared balance
> without actually incurring it.   Then we also consider the individual
> worker's local balance as well and sleep according to local balance.

Even I think it is better to add the balance to the shared balance at
the earliest opportunity.  Just consider the case that there are 5
workers and all have I/O balance of 20, and VacuumCostLimit is 50.  So
Actually, there combined balance is 100 (which is double of the
VacuumCostLimit) but if we don't add immediately then none of the
workers will sleep and it may go to the next cycle which is not very
good. OTOH, if we add 20 immediately then check the shared balance
then all the workers might go for sleep if their local balances have
reached the limit but they will only sleep in proportion to their
local balance.  So IMHO, adding the current balance to shared balance
early is more close to the model we are trying to implement i.e.
shared cost accounting.

>
> >
> > > 2. I think if we cam somehow disallow very small indexes to use parallel workers, then it will be better.   Can
weuse  min_parallel_index_scan_size to decide whether a particular index can participate in a parallel vacuum?
 
> >
> > I think it's a good idea but I'm concerned that the default value of
> > min_parallel_index_scan_size, 512kB, is too small for parallel vacuum
> > purpose. Given that people who want to use parallel vacuum are likely
> > to have a big table the indexes that can be skipped by the default
> > value would be only brin indexes, I think.
> >
>
> Yeah or probably hash indexes in some cases.
>
> > Also I guess that the
> > reason why the default value is small is that
> > min_parallel_index_scan_size compares to the number of blocks being
> > scanned during index scan, not whole index. On the other hand in
> > parallel vacuum we will compare it to the whole index blocks because
> > the index vacuuming is always full scan. So I'm also concerned that
> > user will get confused about reasonable setting.
> >
>
> This setting is about how much of index we are going to scan, so I am
> not sure if it matters whether it is part or full index scan.  Also,
> in an index scan, we will launch multiple workers to scan that index
> and here we will consider launching just one worker.
>
> > As another idea how about using min_parallel_table_scan_size instead?
> >
>
> Hmm, yeah, that can be another option, but it might not be a good idea
> for partial indexes.
>
> > That is, we cannot do parallel vacuum on the table smaller than that
> > value.
> >
>
> Yeah, that makes sense, but I feel if we can directly target index
> scan size that may be a better option.  If we can't use
> min_parallel_index_scan_size, then we can consider this.
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Wed, Dec 4, 2019 at 2:01 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Tue, 3 Dec 2019 at 11:57, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > Forgot one minor point.  Please run pgindent on all the patches.
>
> Got it. I will run pgindent before sending patch from next time.
>

Today, I again read the patch and found a few more minor comments:

1.
void
-LaunchParallelWorkers(ParallelContext *pcxt)
+LaunchParallelWorkers(ParallelContext *pcxt, int nworkers)


I think we should add a comment for this API change which should
indicate why we need to pass nworkers as an additional parameter when
the context itself contains information about the number of workers.

2.
At the beginning of a lazy vacuum (at lazy_scan_heap) we
+ * prepare the parallel context and initialize the DSM segment that contains
+ * shared
information as well as the memory space for storing dead tuples.
+ * When starting either index vacuuming or index cleanup, we launch parallel
+ *
worker processes.  Once all indexes are processed the parallel worker
+ * processes exit.  And then the leader process re-initializes the parallel
+ *
context so that it can use the same DSM for multiple passses of index
+ * vacuum and for performing index cleanup.

a. /And then the leader/After that, the leader ..  This will avoid
using 'and' two times in this sentence.
b. typo, /passses/passes

3.
+ * Macro to check if we are in a parallel lazy vacuum.  If true, we are
+ * in the parallel mode and prepared the DSM segment.

How about changing it slightly as /and prepared the DSM segment./ and
the DSM segment is initialized.?

4.
-
 /* non-export function prototypes */
 static void lazy_scan_heap(Relation onerel, VacuumParams *params,

   LVRelStats *vacrelstats, Relation *Irel, int nindexes,
     bool aggressive);

Spurious change, please remove.  I think this is done by me in one of
the versions.

5.
+ * function we exit from parallel mode.  Index bulk-deletion results are
+ * stored in the DSM segment and update index
statistics as a whole after
+ * exited from parallel mode since all writes are not allowed during parallel
+ * mode.

Can we slightly change the above sentence as "Index bulk-deletion
results are stored in the DSM segment and we update index statistics
as a whole after exited from parallel mode since writes are not
allowed during the parallel mode."?

6.
/*
+ * Reset the local value so that we compute cost balance during
+ * parallel index vacuuming.
+
*/

This comment is a bit unclear.  How about "Reset the local cost values
for leader backend as we have already accumulated the remaining
balance of heap."?

7.
+ /* Do vacuum or cleanup one index */

How about changing it as: Do vacuum or cleanup of the index?

8.
The copying the result normally
+ * happens only after the first time of index vacuuming.

/The copying the ../The copying of the

9.
+ /*
+ * no longer need the locally allocated result and now
+ * stats[idx] points to the DSM segment.
+
 */

How about changing it as below:
"Now that the stats[idx] points to the DSM segment, we don't need the
locally allocated results."

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Wed, 4 Dec 2019 at 04:57, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Dec 4, 2019 at 9:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Dec 4, 2019 at 1:58 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Tue, 3 Dec 2019 at 11:55, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > In your code, I think if two workers enter to compute_parallel_delay
> > > function at the same time, they add their local balance to
> > > VacuumSharedCostBalance and both workers sleep because both values
> > > reach the VacuumCostLimit.
> > >
> >
> > True, but isn't it more appropriate because the local cost of any
> > worker should be ideally added to shared cost as soon as it occurred?
> > I mean to say that we are not adding any cost in shared balance
> > without actually incurring it.   Then we also consider the individual
> > worker's local balance as well and sleep according to local balance.
>
> Even I think it is better to add the balance to the shared balance at
> the earliest opportunity.  Just consider the case that there are 5
> workers and all have I/O balance of 20, and VacuumCostLimit is 50.  So
> Actually, there combined balance is 100 (which is double of the
> VacuumCostLimit) but if we don't add immediately then none of the
> workers will sleep and it may go to the next cycle which is not very
> good. OTOH, if we add 20 immediately then check the shared balance
> then all the workers might go for sleep if their local balances have
> reached the limit but they will only sleep in proportion to their
> local balance.  So IMHO, adding the current balance to shared balance
> early is more close to the model we are trying to implement i.e.
> shared cost accounting.

I agree to add the balance as soon as it occurred. But the problem I'm
concerned is, let's suppose we have 4 workers, the cost limit is 100
and the shared balance is now 95. Two workers, whom local
balance(VacuumCostBalanceLocal) are 40, consumed I/O, added 10 to
theirs local balance and entered compute_parallel_delay function at
the same time. One worker adds 10 to the shared
balance(VacuumSharedCostBalance) and another worker also adds 10 to
the shared balance. The one worker then subtracts the local balance
from the shared balance and sleeps because the shared cost is now 115
(> the cost limit) and its local balance is 50 (> 0.5*(100/4)). Even
another worker also does the same for the same reason. On the other
hand if two workers do that serially, only one worker sleeps and
another worker doesn't because the total shared cost will be 75 when
the later worker enters the condition. At first glance it looks like a
concurrency problem but is that expected behaviour?

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Robert Haas
Date:
On Thu, Nov 21, 2019 at 12:32 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> In v33-0001-Add-index-AM-field-and-callback-for-parallel-ind patch,  I
> am a bit doubtful about this kind of arrangement, where the code in
> the "if" is always unreachable with the current AMs.  I am not sure
> what is the best way to handle this, should we just drop the
> amestimateparallelvacuum altogether?  Because currently, we are just
> providing a size estimate function without a copy function,  even if
> the in future some Am give an estimate about the size of the stats, we
> can not directly memcpy the stat from the local memory to the shared
> memory, we might then need a copy function also from the AM so that it
> can flatten the stats and store in proper format?

I agree that it's a crock to add an AM method that is never used for
anything. That's just asking for the design to prove buggy and
inadequate. One way to avoid this would be to require that every AM
that wants to support parallel vacuuming supply this method, and if it
wants to just return sizeof(IndexBulkDeleteResult), then it can. But I
also think someone should modify one of the AMs to use a
differently-sized object, and then see whether they can really make
parallel vacuum work with this patch. If, as you speculated here, it
needs another API, then we should add both of them or neither. A
half-baked solution is worse than nothing at all.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Block level parallel vacuum

From
Robert Haas
Date:
On Mon, Dec 2, 2019 at 2:26 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
> It's just an example, I'm not saying your idea is bad. ISTM the idea
> is good on an assumption that all indexes take the same time or take a
> long time so I'd also like to consider if this is true even in
> production and which approaches is better if we don't have such
> assumption.

I think his idea is good. You're not wrong when you say that there are
cases where it could work out badly, but I think on the whole it's a
clear improvement. Generally, the indexes should be of relatively
similar size because index size is driven by table size; it's true
that different AMs could result in different-size indexes, but it
seems like a stretch to suppose that the indexes that don't support
parallelism are also going to be the little tiny ones that go fast
anyway, unless we have some evidence that this is really true. I also
wonder whether we really need the option to disable parallel vacuum in
the first place. Maybe I'm looking in the right place, but I'm not
finding anything in the way of comments or documentation explaining
why some AMs don't support it. It's an important design point, and
should be documented.

I also think PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION seems like a
waste of space. For parallel queries, there is a trade-off between
having the leader do work (which can speed up the query) and having it
remain idle so that it can immediately absorb tuples from workers and
keep them from having their tuple queues fill up (which can speed up
the query). But here, at least as I understand it, there's no such
trade-off. Having the leader fail to participate is just a loser.
Maybe it's useful to test while debugging the patch, but why should
the committed code support it?

To respond to another point from a different part of the email chain,
the reason why LaunchParallelWorkers() does not take an argument for
the number of workers is because I believed that the caller should
always know how many workers they're going to want at the time they
CreateParallelContext(). Increasing it later is not possible, because
the DSM has already sized based on the count provided. I grant that it
would be possible to allow the number to be reduced later, but why
would you want to do that? Why not get the number right when first
creating the DSM?

Is there any legitimate use case for parallel vacuum in combination
with vacuum cost delay? As I understand it, any serious vacuuming is
going to be I/O-bound, so can you really need multiple workers at the
same time that you are limiting the I/O rate? Perhaps it's possible if
the I/O limit is so high that a single worker can't hit the limit by
itself, but multiple workers can, but it seems like a bad idea to
spawn more workers and then throttle them rather than just starting
fewer workers. In any case, the algorithm suggested in vacuumlazy.c
around the definition of VacuumSharedCostBalance seems almost the
opposite of what you probably want. The idea there seems to be that
you shouldn't make a worker sleep if it hasn't actually got to do
anything. Apparently the idea is that if you have 3 workers and you
only have enough I/O rate for 1 worker, you want all 3 workers to run
at once, so that the I/O is random, rather than having them run 1 at a
time, so that the I/O is sequential. That seems counterintuitive. It
could be right if the indexes are in different tablespaces, but if
they are in the same tablespace it's probably wrong. I guess it could
still be right if there's just so much I/O that you aren't going to
run out ever, and the more important consideration is that you don't
know which index will take longer to vacuum and so want to start them
all at the same time so that you don't accidentally start the slow one
last, but that sounds like a stretch. I think this whole area needs
more thought. I feel like we're trying to jam a go-slower feature and
a go-faster feature into the same box.

+ * vacuum and for performing index cleanup.  Note that all parallel workers
+ * live during either index vacuuming or index cleanup but the leader process
+ * neither exits from the parallel mode nor destroys the parallel context.
+ * For updating the index statistics, since any updates are not allowed during
+ * parallel mode we update the index statistics after exited from the parallel

The first of these sentences ("Note that all...") is not very clear to
me, and seems like it may amount to a statement that the leader
doesn't try to destroy the parallel context too early, but since I
don't understand it, maybe that's not what it is saying. The second
sentence needs exited -> exiting, and maybe some more work on the
grammar, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Thu, Dec 5, 2019 at 12:21 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Wed, 4 Dec 2019 at 04:57, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Dec 4, 2019 at 9:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Wed, Dec 4, 2019 at 1:58 AM Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > On Tue, 3 Dec 2019 at 11:55, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > In your code, I think if two workers enter to compute_parallel_delay
> > > > function at the same time, they add their local balance to
> > > > VacuumSharedCostBalance and both workers sleep because both values
> > > > reach the VacuumCostLimit.
> > > >
> > >
> > > True, but isn't it more appropriate because the local cost of any
> > > worker should be ideally added to shared cost as soon as it occurred?
> > > I mean to say that we are not adding any cost in shared balance
> > > without actually incurring it.   Then we also consider the individual
> > > worker's local balance as well and sleep according to local balance.
> >
> > Even I think it is better to add the balance to the shared balance at
> > the earliest opportunity.  Just consider the case that there are 5
> > workers and all have I/O balance of 20, and VacuumCostLimit is 50.  So
> > Actually, there combined balance is 100 (which is double of the
> > VacuumCostLimit) but if we don't add immediately then none of the
> > workers will sleep and it may go to the next cycle which is not very
> > good. OTOH, if we add 20 immediately then check the shared balance
> > then all the workers might go for sleep if their local balances have
> > reached the limit but they will only sleep in proportion to their
> > local balance.  So IMHO, adding the current balance to shared balance
> > early is more close to the model we are trying to implement i.e.
> > shared cost accounting.
>
> I agree to add the balance as soon as it occurred. But the problem I'm
> concerned is, let's suppose we have 4 workers, the cost limit is 100
> and the shared balance is now 95. Two workers, whom local
> balance(VacuumCostBalanceLocal) are 40, consumed I/O, added 10 to
> theirs local balance and entered compute_parallel_delay function at
> the same time. One worker adds 10 to the shared
> balance(VacuumSharedCostBalance) and another worker also adds 10 to
> the shared balance. The one worker then subtracts the local balance
> from the shared balance and sleeps because the shared cost is now 115
> (> the cost limit) and its local balance is 50 (> 0.5*(100/4)). Even
> another worker also does the same for the same reason. On the other
> hand if two workers do that serially, only one worker sleeps and
> another worker doesn't because the total shared cost will be 75 when
> the later worker enters the condition. At first glance it looks like a
> concurrency problem but is that expected behaviour?

If both workers sleep then the remaining shared balance will be 15 and
their local balances will be 0. OTOH if one worker sleep then the
remaining shared balance will be 75, so the second worker has missed
this sleep cycle but on the next opportunity when the shared value
again reaches 100 and if the second worker performs more I/O it will
sleep for a longer duration.

Even if we add it to the shared balance later (like you were doing
earlier) then also we can reproduce the similar behavior, suppose
shared balance is 85 and both workers have local balance 40 each. Now,
each worker has done the I/O of 10.  Now, suppose we don't add to
shared balance then both workers will see the balance as 85+10= 95 so
none of them will sleep. OTOH, if they do serially the first worker
will add 10 and make it 95 and then the second worker will locally
check 95+10 which is more than 100 and it will sleep. Right?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Thu, Dec 5, 2019 at 1:41 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Dec 2, 2019 at 2:26 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> > It's just an example, I'm not saying your idea is bad. ISTM the idea
> > is good on an assumption that all indexes take the same time or take a
> > long time so I'd also like to consider if this is true even in
> > production and which approaches is better if we don't have such
> > assumption.
>
> I think his idea is good. You're not wrong when you say that there are
> cases where it could work out badly, but I think on the whole it's a
> clear improvement. Generally, the indexes should be of relatively
> similar size because index size is driven by table size; it's true
> that different AMs could result in different-size indexes, but it
> seems like a stretch to suppose that the indexes that don't support
> parallelism are also going to be the little tiny ones that go fast
> anyway, unless we have some evidence that this is really true. I also
> wonder whether we really need the option to disable parallel vacuum in
> the first place.
>

I think it could be required for the cases where the AM doesn't have a
way (or it is difficult to come up with a way) to communicate the
stats allocated by the first ambulkdelete call to the subsequent ones
until amvacuumcleanup.  Currently, we have such a case for the Gist
index, see email thread [1]. Though we have come up with a way to
avoid that for Gist indexes, I am not sure if we can assume that it is
the case for any possible index AM especially when there is a
provision that indexAM can have additional stats information.  In the
worst case, if we have to modify some existing index AM like we did
for the Gist index, we need such a provision so that it is possible.
In the ideal case, the index AM should provide a way to copy such
stats, but we can't assume that, so we come up with this option.

We have used this for dummy_index_am which also provides a way to test it.

> Maybe I'm looking in the right place, but I'm not
> finding anything in the way of comments or documentation explaining
> why some AMs don't support it. It's an important design point, and
> should be documented.
>

Agreed.  We should do this.

> I also think PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION seems like a
> waste of space. For parallel queries, there is a trade-off between
> having the leader do work (which can speed up the query) and having it
> remain idle so that it can immediately absorb tuples from workers and
> keep them from having their tuple queues fill up (which can speed up
> the query). But here, at least as I understand it, there's no such
> trade-off. Having the leader fail to participate is just a loser.
> Maybe it's useful to test while debugging the patch,
>

Yeah, it is primarily a debugging/testing aid patch and it helped us
in discovering some issues.  During development, it is being used for
tesing purpose as well.  This is the reason the code is under #ifdef

> but why should
> the committed code support it?
>

I am also not sure whether we should commit this part of code and that
is why I told in one of the above emails to keep it as a separate
patch.  We can later see whether to commit this code.  Now, the point
in its favor is that we already have a similar define
(DISABLE_LEADER_PARTICIPATION) for parallel create index, so having it
here is not a bad idea.  I think it might help us in debugging some
bugs where we want forcefully the index to be vacuumed by some worker.
We might want to have something like force_parallel_mode for
testing/debugging purpose, but not sure which is better.  I think
having something as a debugging aid for such features is good.

> To respond to another point from a different part of the email chain,
> the reason why LaunchParallelWorkers() does not take an argument for
> the number of workers is because I believed that the caller should
> always know how many workers they're going to want at the time they
> CreateParallelContext(). Increasing it later is not possible, because
> the DSM has already sized based on the count provided. I grant that it
> would be possible to allow the number to be reduced later, but why
> would you want to do that? Why not get the number right when first
> creating the DSM?
>

Here, we have a need to reduce the number of workers.  Index Vacuum
has two different phases (index vacuum and index cleanup) which uses
the same parallel-context/DSM but both could have different
requirements for workers.  The second phase (cleanup) would normally
need fewer workers as if the work is done in the first phase, second
wouldn't need it, but we have exceptions like gin indexes where we
need it for the second phase as well because it takes the pass
over-index again even if we have cleaned the index in the first phase.
Now, consider the case where we have 3 btree indexes and 2 gin
indexes, we would need 5 workers for index vacuum phase and 2 workers
for index cleanup phase.  There are other cases too.

We also considered to have a separate DSM for each phase, but that
appeared to have overhead without much benefit.

> Is there any legitimate use case for parallel vacuum in combination
> with vacuum cost delay?
>

Yeah, we also initially thought that it is not legitimate to use a
parallel vacuum with a cost delay.  But to get a wider view, we
started a separate thread [2] and there we reach to the conclusion
that we need a solution for throttling [3].

>
> + * vacuum and for performing index cleanup.  Note that all parallel workers
> + * live during either index vacuuming or index cleanup but the leader process
> + * neither exits from the parallel mode nor destroys the parallel context.
> + * For updating the index statistics, since any updates are not allowed during
> + * parallel mode we update the index statistics after exited from the parallel
>
> The first of these sentences ("Note that all...") is not very clear to
> me, and seems like it may amount to a statement that the leader
> doesn't try to destroy the parallel context too early, but since I
> don't understand it, maybe that's not what it is saying.
>

Your understanding is correct.  How about if we modify it to something
like: "Note that parallel workers are alive only during index vacuum
or index cleanup but the leader process neither exits from the
parallel mode nor destroys the parallel context until the entire
parallel operation is finished." OR something like "The leader backend
holds the parallel context till the index vacuum and cleanup is
finished.  Both index vacuum and cleanup separately perform the work
with parallel workers."

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Thu, Dec 5, 2019 at 1:41 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Dec 2, 2019 at 2:26 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> > It's just an example, I'm not saying your idea is bad. ISTM the idea
> > is good on an assumption that all indexes take the same time or take a
> > long time so I'd also like to consider if this is true even in
> > production and which approaches is better if we don't have such
> > assumption.
>
> I think his idea is good. You're not wrong when you say that there are
> cases where it could work out badly, but I think on the whole it's a
> clear improvement. Generally, the indexes should be of relatively
> similar size because index size is driven by table size; it's true
> that different AMs could result in different-size indexes, but it
> seems like a stretch to suppose that the indexes that don't support
> parallelism are also going to be the little tiny ones that go fast
> anyway, unless we have some evidence that this is really true. I also
> wonder whether we really need the option to disable parallel vacuum in
> the first place. Maybe I'm looking in the right place, but I'm not
> finding anything in the way of comments or documentation explaining
> why some AMs don't support it. It's an important design point, and
> should be documented.
>
> I also think PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION seems like a
> waste of space. For parallel queries, there is a trade-off between
> having the leader do work (which can speed up the query) and having it
> remain idle so that it can immediately absorb tuples from workers and
> keep them from having their tuple queues fill up (which can speed up
> the query). But here, at least as I understand it, there's no such
> trade-off. Having the leader fail to participate is just a loser.
> Maybe it's useful to test while debugging the patch, but why should
> the committed code support it?
>
> To respond to another point from a different part of the email chain,
> the reason why LaunchParallelWorkers() does not take an argument for
> the number of workers is because I believed that the caller should
> always know how many workers they're going to want at the time they
> CreateParallelContext(). Increasing it later is not possible, because
> the DSM has already sized based on the count provided. I grant that it
> would be possible to allow the number to be reduced later, but why
> would you want to do that? Why not get the number right when first
> creating the DSM?
>
> Is there any legitimate use case for parallel vacuum in combination
> with vacuum cost delay? As I understand it, any serious vacuuming is
> going to be I/O-bound, so can you really need multiple workers at the
> same time that you are limiting the I/O rate? Perhaps it's possible if
> the I/O limit is so high that a single worker can't hit the limit by
> itself, but multiple workers can, but it seems like a bad idea to
> spawn more workers and then throttle them rather than just starting
> fewer workers.

I agree that there is no point is first to spawn more workers to get
the work done faster and later throttle them.  Basically, that will
lose the whole purpose of running it in parallel.  OTOH, we should
also consider the cases where there could be some vacuum that may not
hit the I/O limit right? because it may find all the pages in the
shared buffers and they might not need to dirty a lot of pages.  So I
think for such cases it is advantageous to run in parallel.  The
problem is that there is no way to know in advance whether the total
I/O for the vacuum will hit the I/O limit or not so we can not decide
in advance whether to run it in parallel or not.

 In any case, the algorithm suggested in vacuumlazy.c
> around the definition of VacuumSharedCostBalance seems almost the
> opposite of what you probably want. The idea there seems to be that
> you shouldn't make a worker sleep if it hasn't actually got to do
> anything. Apparently the idea is that if you have 3 workers and you
> only have enough I/O rate for 1 worker, you want all 3 workers to run
> at once, so that the I/O is random, rather than having them run 1 at a
> time, so that the I/O is sequential. That seems counterintuitive. It
> could be right if the indexes are in different tablespaces, but if
> they are in the same tablespace it's probably wrong. I guess it could
> still be right if there's just so much I/O that you aren't going to
> run out ever, and the more important consideration is that you don't
> know which index will take longer to vacuum and so want to start them
> all at the same time so that you don't accidentally start the slow one
> last, but that sounds like a stretch. I think this whole area needs
> more thought. I feel like we're trying to jam a go-slower feature and
> a go-faster feature into the same box.
>
> + * vacuum and for performing index cleanup.  Note that all parallel workers
> + * live during either index vacuuming or index cleanup but the leader process
> + * neither exits from the parallel mode nor destroys the parallel context.
> + * For updating the index statistics, since any updates are not allowed during
> + * parallel mode we update the index statistics after exited from the parallel
>
> The first of these sentences ("Note that all...") is not very clear to
> me, and seems like it may amount to a statement that the leader
> doesn't try to destroy the parallel context too early, but since I
> don't understand it, maybe that's not what it is saying. The second
> sentence needs exited -> exiting, and maybe some more work on the
> grammar, too.
>



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Thu, Dec 5, 2019 at 10:52 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Dec 5, 2019 at 1:41 AM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Mon, Dec 2, 2019 at 2:26 PM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > > It's just an example, I'm not saying your idea is bad. ISTM the idea
> > > is good on an assumption that all indexes take the same time or take a
> > > long time so I'd also like to consider if this is true even in
> > > production and which approaches is better if we don't have such
> > > assumption.
> >
> > I think his idea is good. You're not wrong when you say that there are
> > cases where it could work out badly, but I think on the whole it's a
> > clear improvement. Generally, the indexes should be of relatively
> > similar size because index size is driven by table size; it's true
> > that different AMs could result in different-size indexes, but it
> > seems like a stretch to suppose that the indexes that don't support
> > parallelism are also going to be the little tiny ones that go fast
> > anyway, unless we have some evidence that this is really true. I also
> > wonder whether we really need the option to disable parallel vacuum in
> > the first place.
> >
>
> I think it could be required for the cases where the AM doesn't have a
> way (or it is difficult to come up with a way) to communicate the
> stats allocated by the first ambulkdelete call to the subsequent ones
> until amvacuumcleanup.  Currently, we have such a case for the Gist
> index, see email thread [1].
>

oops, I had referred to a couple of other discussions in my reply but
forgot to mention the links, doing it now.

[1] - https://www.postgresql.org/message-id/CAA4eK1LGr%2BMN0xHZpJ2dfS8QNQ1a_aROKowZB%2BMPNep8FVtwAA%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CAA4eK1J-VoR9gzS5E75pcD-OH0mEyCdp8RihcwKrcuw7J-Q0%2Bw%40mail.gmail.com
[3] - https://www.postgresql.org/message-id/20191106022550.zq7nai2ct2ashegq%40alap3.anarazel.de

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Thu, Dec 5, 2019 at 12:54 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Nov 21, 2019 at 12:32 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > In v33-0001-Add-index-AM-field-and-callback-for-parallel-ind patch,  I
> > am a bit doubtful about this kind of arrangement, where the code in
> > the "if" is always unreachable with the current AMs.  I am not sure
> > what is the best way to handle this, should we just drop the
> > amestimateparallelvacuum altogether?  Because currently, we are just
> > providing a size estimate function without a copy function,  even if
> > the in future some Am give an estimate about the size of the stats, we
> > can not directly memcpy the stat from the local memory to the shared
> > memory, we might then need a copy function also from the AM so that it
> > can flatten the stats and store in proper format?
>
> I agree that it's a crock to add an AM method that is never used for
> anything. That's just asking for the design to prove buggy and
> inadequate. One way to avoid this would be to require that every AM
> that wants to support parallel vacuuming supply this method, and if it
> wants to just return sizeof(IndexBulkDeleteResult), then it can. But I
> also think someone should modify one of the AMs to use a
> differently-sized object, and then see whether they can really make
> parallel vacuum work with this patch. If, as you speculated here, it
> needs another API, then we should add both of them or neither. A
> half-baked solution is worse than nothing at all.
>

It is possible that we need another API to make it work as is
currently the case for Gist Index where we need to someway first
serialize it (which as mentioned earlier that we have now a way to
avoid serializing it).   However, if it is for some simple case where
there are some additional constants apart from IndexBulkDeleteResult,
then we don't need it.  I think here, we were cautious to not expose
more API's unless there is a real need, but I guess it is better to
completely avoid such cases and don't expose any API unless we have
some examples.  In any case, the user will have the facility to
disable a parallel vacuum for such cases.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Robert Haas
Date:
On Thu, Dec 5, 2019 at 12:22 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> I think it could be required for the cases where the AM doesn't have a
> way (or it is difficult to come up with a way) to communicate the
> stats allocated by the first ambulkdelete call to the subsequent ones
> until amvacuumcleanup.  Currently, we have such a case for the Gist
> index, see email thread [1]. Though we have come up with a way to
> avoid that for Gist indexes, I am not sure if we can assume that it is
> the case for any possible index AM especially when there is a
> provision that indexAM can have additional stats information.  In the
> worst case, if we have to modify some existing index AM like we did
> for the Gist index, we need such a provision so that it is possible.
> In the ideal case, the index AM should provide a way to copy such
> stats, but we can't assume that, so we come up with this option.
>
> We have used this for dummy_index_am which also provides a way to test it.

I think it might be a good idea to change what we expect index AMs to
do rather than trying to make anything that they happen to be doing
right now work, no matter how crazy. In particular, suppose we say
that you CAN'T add data on to the end of IndexBulkDeleteResult any
more, and that instead the extra data is passed through a separate
parameter. And then you add an estimate method that gives the size of
the space provided by that parameter (and if the estimate method isn't
defined then the extra parameter is passed as NULL) and document that
the data stored there might get flat-copied. Now, you've taken the
onus off of parallel vacuum to cope with any crazy thing a
hypothetical AM might be doing, and instead you've defined the
behavior of that hypothetical AM as wrong. If somebody really needs
that, it's now their job to modify the index AM machinery further
instead of your job to somehow cope.

> Here, we have a need to reduce the number of workers.  Index Vacuum
> has two different phases (index vacuum and index cleanup) which uses
> the same parallel-context/DSM but both could have different
> requirements for workers.  The second phase (cleanup) would normally
> need fewer workers as if the work is done in the first phase, second
> wouldn't need it, but we have exceptions like gin indexes where we
> need it for the second phase as well because it takes the pass
> over-index again even if we have cleaned the index in the first phase.
> Now, consider the case where we have 3 btree indexes and 2 gin
> indexes, we would need 5 workers for index vacuum phase and 2 workers
> for index cleanup phase.  There are other cases too.
>
> We also considered to have a separate DSM for each phase, but that
> appeared to have overhead without much benefit.

How about adding an additional argument to ReinitializeParallelDSM()
that allows the number of workers to be reduced? That seems like it
would be less confusing than what you have now, and would involve
modify code in a lot fewer places.

> > Is there any legitimate use case for parallel vacuum in combination
> > with vacuum cost delay?
> >
>
> Yeah, we also initially thought that it is not legitimate to use a
> parallel vacuum with a cost delay.  But to get a wider view, we
> started a separate thread [2] and there we reach to the conclusion
> that we need a solution for throttling [3].

OK, thanks for the pointer. This doesn't address the other part of my
complaint, though, which is that the whole discussion between you and
Dilip and Sawada-san presumes that you want the delays ought to be
scattered across the workers roughly in proportion to their share of
the I/O, and it seems NOT AT ALL clear that this is actually a
desirable property. You're all assuming that, but none of you has
justified it, and I think the opposite might be true in some cases.
You're adding extra complexity for something that isn't a clear
improvement.

> Your understanding is correct.  How about if we modify it to something
> like: "Note that parallel workers are alive only during index vacuum
> or index cleanup but the leader process neither exits from the
> parallel mode nor destroys the parallel context until the entire
> parallel operation is finished." OR something like "The leader backend
> holds the parallel context till the index vacuum and cleanup is
> finished.  Both index vacuum and cleanup separately perform the work
> with parallel workers."

How about if you just delete it? You don't need a comment explaining
that this caller of CreateParallelContext() does something which
*every* caller of CreateParallelContext() must do. If you didn't do
that, you'd fail assertions and everything would break, so *of course*
you are doing it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Block level parallel vacuum

From
Robert Haas
Date:
[ Please trim excess quoted material from your replies. ]

On Thu, Dec 5, 2019 at 12:27 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> I agree that there is no point is first to spawn more workers to get
> the work done faster and later throttle them.  Basically, that will
> lose the whole purpose of running it in parallel.

Right.  I mean if you throttle something that would have otherwise
kept 3 workers running full blast back to the point where it uses the
equivalent of 2.5 workers, that might make sense. It's a little
marginal, maybe, but sure. But once you throttle it back to <= 2
workers, you're just wasting resources.

I think my concern here is ultimately more about usability than
whether or not we allow throttling. I agree that there are some
possible cases where throttling a parallel vacuum is useful, so I
guess we should support it. But I also think there's a real risk of
people not realizing that throttling is happening and then being sad
because they used parallel VACUUM and it was still slow. I think we
should document explicitly that parallel VACUUM is still potentially
throttled and that you should consider setting the cost delay to a
higher value or 0 before using it.

We might even want to add a FAST option (or similar) to VACUUM that
makes it behave as if vacuum_cost_delay = 0, and add something to the
examples section for VACUUM that suggests e.g.

VACUUM (PARALLEL 3, FAST) my_big_table
Vacuum my_big_table with 3 workers and with resource throttling
disabled for maximum performance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh
Date:
On Thu, 5 Dec 2019 at 19:54, Robert Haas <robertmhaas@gmail.com> wrote:
>
> [ Please trim excess quoted material from your replies. ]
>
> On Thu, Dec 5, 2019 at 12:27 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > I agree that there is no point is first to spawn more workers to get
> > the work done faster and later throttle them.  Basically, that will
> > lose the whole purpose of running it in parallel.
>
> Right.  I mean if you throttle something that would have otherwise
> kept 3 workers running full blast back to the point where it uses the
> equivalent of 2.5 workers, that might make sense. It's a little
> marginal, maybe, but sure. But once you throttle it back to <= 2
> workers, you're just wasting resources.
>
> I think my concern here is ultimately more about usability than
> whether or not we allow throttling. I agree that there are some
> possible cases where throttling a parallel vacuum is useful, so I
> guess we should support it. But I also think there's a real risk of
> people not realizing that throttling is happening and then being sad
> because they used parallel VACUUM and it was still slow. I think we
> should document explicitly that parallel VACUUM is still potentially
> throttled and that you should consider setting the cost delay to a
> higher value or 0 before using it.
>
> We might even want to add a FAST option (or similar) to VACUUM that
> makes it behave as if vacuum_cost_delay = 0, and add something to the
> examples section for VACUUM that suggests e.g.
>
> VACUUM (PARALLEL 3, FAST) my_big_table
> Vacuum my_big_table with 3 workers and with resource throttling
> disabled for maximum performance.
>

Please find  some review comments for v35 patch set

1.
+    /* Return immediately when parallelism disabled */
+    if (max_parallel_maintenance_workers == 0)
+        return 0;
+
Here, we should add check of max_worker_processes because if
max_worker_processes is set as 0, then we can't launch any worker so
we should return from here.

2.
+    /* cap by max_parallel_maintenace_workers */
+    parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
+
Here also, we should consider max_worker_processes to calculate
parallel_workers. (by default, max_worker_processes = 8)

Thanks and Regards
Mahendra Thalor
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Fri, Dec 6, 2019 at 12:55 AM Mahendra Singh <mahi6run@gmail.com> wrote:
>
> On Thu, 5 Dec 2019 at 19:54, Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > [ Please trim excess quoted material from your replies. ]
> >
> > On Thu, Dec 5, 2019 at 12:27 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > I agree that there is no point is first to spawn more workers to get
> > > the work done faster and later throttle them.  Basically, that will
> > > lose the whole purpose of running it in parallel.
> >
> > Right.  I mean if you throttle something that would have otherwise
> > kept 3 workers running full blast back to the point where it uses the
> > equivalent of 2.5 workers, that might make sense. It's a little
> > marginal, maybe, but sure. But once you throttle it back to <= 2
> > workers, you're just wasting resources.
> >
> > I think my concern here is ultimately more about usability than
> > whether or not we allow throttling. I agree that there are some
> > possible cases where throttling a parallel vacuum is useful, so I
> > guess we should support it. But I also think there's a real risk of
> > people not realizing that throttling is happening and then being sad
> > because they used parallel VACUUM and it was still slow. I think we
> > should document explicitly that parallel VACUUM is still potentially
> > throttled and that you should consider setting the cost delay to a
> > higher value or 0 before using it.
> >
> > We might even want to add a FAST option (or similar) to VACUUM that
> > makes it behave as if vacuum_cost_delay = 0, and add something to the
> > examples section for VACUUM that suggests e.g.
> >
> > VACUUM (PARALLEL 3, FAST) my_big_table
> > Vacuum my_big_table with 3 workers and with resource throttling
> > disabled for maximum performance.
> >
>
> Please find  some review comments for v35 patch set
>
> 1.
> +    /* Return immediately when parallelism disabled */
> +    if (max_parallel_maintenance_workers == 0)
> +        return 0;
> +
> Here, we should add check of max_worker_processes because if
> max_worker_processes is set as 0, then we can't launch any worker so
> we should return from here.
>
> 2.
> +    /* cap by max_parallel_maintenace_workers */
> +    parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
> +
> Here also, we should consider max_worker_processes to calculate
> parallel_workers. (by default, max_worker_processes = 8)

IMHO, it's enough to cap with max_parallel_maintenace_workers.  So I
think it's the user's responsibility to keep
max_parallel_maintenace_workers under parallel_workers limit.  And, if
the user fails to set max_parallel_maintenace_workers under the
parallel_workers or enough workers are not available then
LaunchParallel worker will take care.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Thu, Dec 5, 2019 at 7:44 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> I think it might be a good idea to change what we expect index AMs to
> do rather than trying to make anything that they happen to be doing
> right now work, no matter how crazy. In particular, suppose we say
> that you CAN'T add data on to the end of IndexBulkDeleteResult any
> more, and that instead the extra data is passed through a separate
> parameter. And then you add an estimate method that gives the size of
> the space provided by that parameter (and if the estimate method isn't
> defined then the extra parameter is passed as NULL) and document that
> the data stored there might get flat-copied.
>

I think this is a good idea and serves the purpose we are trying to
achieve currently.  However, if there are any IndexAM that is using
the current way to pass stats with additional information, they would
need to change even if they don't want to use parallel vacuum
functionality (say because their indexes are too small or whatever
other reasons).  I think this is a reasonable trade-off and the
changes on their end won't be that big.  So, we should do this.

> Now, you've taken the
> onus off of parallel vacuum to cope with any crazy thing a
> hypothetical AM might be doing, and instead you've defined the
> behavior of that hypothetical AM as wrong. If somebody really needs
> that, it's now their job to modify the index AM machinery further
> instead of your job to somehow cope.
>

makes sense.

> > Here, we have a need to reduce the number of workers.  Index Vacuum
> > has two different phases (index vacuum and index cleanup) which uses
> > the same parallel-context/DSM but both could have different
> > requirements for workers.  The second phase (cleanup) would normally
> > need fewer workers as if the work is done in the first phase, second
> > wouldn't need it, but we have exceptions like gin indexes where we
> > need it for the second phase as well because it takes the pass
> > over-index again even if we have cleaned the index in the first phase.
> > Now, consider the case where we have 3 btree indexes and 2 gin
> > indexes, we would need 5 workers for index vacuum phase and 2 workers
> > for index cleanup phase.  There are other cases too.
> >
> > We also considered to have a separate DSM for each phase, but that
> > appeared to have overhead without much benefit.
>
> How about adding an additional argument to ReinitializeParallelDSM()
> that allows the number of workers to be reduced? That seems like it
> would be less confusing than what you have now, and would involve
> modify code in a lot fewer places.
>

Yeah, we can do that.  We can maintain some information in
LVParallelState which indicates whether we need to reinitialize the
DSM before launching workers.  Sawada-San, do you see any problem with
this idea?


> > > Is there any legitimate use case for parallel vacuum in combination
> > > with vacuum cost delay?
> > >
> >
> > Yeah, we also initially thought that it is not legitimate to use a
> > parallel vacuum with a cost delay.  But to get a wider view, we
> > started a separate thread [2] and there we reach to the conclusion
> > that we need a solution for throttling [3].
>
> OK, thanks for the pointer. This doesn't address the other part of my
> complaint, though, which is that the whole discussion between you and
> Dilip and Sawada-san presumes that you want the delays ought to be
> scattered across the workers roughly in proportion to their share of
> the I/O, and it seems NOT AT ALL clear that this is actually a
> desirable property. You're all assuming that, but none of you has
> justified it, and I think the opposite might be true in some cases.
>

IIUC, your complaint is that in some cases, even if the I/O rate is
enough for one worker, we will still launch more workers and throttle
them.  The point is we can't know in advance how much I/O is required
for each index.  We can try to do that based on index size, but I
don't think that will be right because it is possible that for the
bigger index, we don't need to dirty the pages and most of the pages
are in shared buffers, etc.  The current algorithm won't use more I/O
than required and it will be good for cases where one or some of the
indexes are doing more I/O as compared to others and it will also work
equally good even when the indexes have a similar amount of work.  I
think we could do better if we can predict how much I/O each index
requires before actually scanning the index.

I agree with the other points (add a FAST option for parallel vacuum
and document that parallel vacuum is still potentially throttled ...)
you made in a separate email.


> You're adding extra complexity for something that isn't a clear
> improvement.
>
> > Your understanding is correct.  How about if we modify it to something
> > like: "Note that parallel workers are alive only during index vacuum
> > or index cleanup but the leader process neither exits from the
> > parallel mode nor destroys the parallel context until the entire
> > parallel operation is finished." OR something like "The leader backend
> > holds the parallel context till the index vacuum and cleanup is
> > finished.  Both index vacuum and cleanup separately perform the work
> > with parallel workers."
>
> How about if you just delete it? You don't need a comment explaining
> that this caller of CreateParallelContext() does something which
> *every* caller of CreateParallelContext() must do. If you didn't do
> that, you'd fail assertions and everything would break, so *of course*
> you are doing it.
>

Fair enough, we can just remove this part of the comment.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh
Date:
On Fri, 6 Dec 2019 at 10:50, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Dec 5, 2019 at 7:44 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > I think it might be a good idea to change what we expect index AMs to
> > do rather than trying to make anything that they happen to be doing
> > right now work, no matter how crazy. In particular, suppose we say
> > that you CAN'T add data on to the end of IndexBulkDeleteResult any
> > more, and that instead the extra data is passed through a separate
> > parameter. And then you add an estimate method that gives the size of
> > the space provided by that parameter (and if the estimate method isn't
> > defined then the extra parameter is passed as NULL) and document that
> > the data stored there might get flat-copied.
> >
>
> I think this is a good idea and serves the purpose we are trying to
> achieve currently.  However, if there are any IndexAM that is using
> the current way to pass stats with additional information, they would
> need to change even if they don't want to use parallel vacuum
> functionality (say because their indexes are too small or whatever
> other reasons).  I think this is a reasonable trade-off and the
> changes on their end won't be that big.  So, we should do this.
>
> > Now, you've taken the
> > onus off of parallel vacuum to cope with any crazy thing a
> > hypothetical AM might be doing, and instead you've defined the
> > behavior of that hypothetical AM as wrong. If somebody really needs
> > that, it's now their job to modify the index AM machinery further
> > instead of your job to somehow cope.
> >
>
> makes sense.
>
> > > Here, we have a need to reduce the number of workers.  Index Vacuum
> > > has two different phases (index vacuum and index cleanup) which uses
> > > the same parallel-context/DSM but both could have different
> > > requirements for workers.  The second phase (cleanup) would normally
> > > need fewer workers as if the work is done in the first phase, second
> > > wouldn't need it, but we have exceptions like gin indexes where we
> > > need it for the second phase as well because it takes the pass
> > > over-index again even if we have cleaned the index in the first phase.
> > > Now, consider the case where we have 3 btree indexes and 2 gin
> > > indexes, we would need 5 workers for index vacuum phase and 2 workers
> > > for index cleanup phase.  There are other cases too.
> > >
> > > We also considered to have a separate DSM for each phase, but that
> > > appeared to have overhead without much benefit.
> >
> > How about adding an additional argument to ReinitializeParallelDSM()
> > that allows the number of workers to be reduced? That seems like it
> > would be less confusing than what you have now, and would involve
> > modify code in a lot fewer places.
> >
>
> Yeah, we can do that.  We can maintain some information in
> LVParallelState which indicates whether we need to reinitialize the
> DSM before launching workers.  Sawada-San, do you see any problem with
> this idea?
>
>
> > > > Is there any legitimate use case for parallel vacuum in combination
> > > > with vacuum cost delay?
> > > >
> > >
> > > Yeah, we also initially thought that it is not legitimate to use a
> > > parallel vacuum with a cost delay.  But to get a wider view, we
> > > started a separate thread [2] and there we reach to the conclusion
> > > that we need a solution for throttling [3].
> >
> > OK, thanks for the pointer. This doesn't address the other part of my
> > complaint, though, which is that the whole discussion between you and
> > Dilip and Sawada-san presumes that you want the delays ought to be
> > scattered across the workers roughly in proportion to their share of
> > the I/O, and it seems NOT AT ALL clear that this is actually a
> > desirable property. You're all assuming that, but none of you has
> > justified it, and I think the opposite might be true in some cases.
> >
>
> IIUC, your complaint is that in some cases, even if the I/O rate is
> enough for one worker, we will still launch more workers and throttle
> them.  The point is we can't know in advance how much I/O is required
> for each index.  We can try to do that based on index size, but I
> don't think that will be right because it is possible that for the
> bigger index, we don't need to dirty the pages and most of the pages
> are in shared buffers, etc.  The current algorithm won't use more I/O
> than required and it will be good for cases where one or some of the
> indexes are doing more I/O as compared to others and it will also work
> equally good even when the indexes have a similar amount of work.  I
> think we could do better if we can predict how much I/O each index
> requires before actually scanning the index.
>
> I agree with the other points (add a FAST option for parallel vacuum
> and document that parallel vacuum is still potentially throttled ...)
> you made in a separate email.
>
>
> > You're adding extra complexity for something that isn't a clear
> > improvement.
> >
> > > Your understanding is correct.  How about if we modify it to something
> > > like: "Note that parallel workers are alive only during index vacuum
> > > or index cleanup but the leader process neither exits from the
> > > parallel mode nor destroys the parallel context until the entire
> > > parallel operation is finished." OR something like "The leader backend
> > > holds the parallel context till the index vacuum and cleanup is
> > > finished.  Both index vacuum and cleanup separately perform the work
> > > with parallel workers."
> >
> > How about if you just delete it? You don't need a comment explaining
> > that this caller of CreateParallelContext() does something which
> > *every* caller of CreateParallelContext() must do. If you didn't do
> > that, you'd fail assertions and everything would break, so *of course*
> > you are doing it.
> >
>
> Fair enough, we can just remove this part of the comment.
>

Hi All,
Below is the brief about testing of v35 patch set.

1.
All the test cases are passing on the top of v35 patch set (make check world and all contrib test cases)

2.
By enabling PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION, "make check world" is passing.

3.
After v35 patch, vacuum.sql regression test is taking too much time due to large number of inserts so by reducing number of tuples, we can reduce that time.
+INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM generate_series(1,100000) i;

here, instead of 100000, we can make 1000 to reduce time of this test case because we only want to test code and functionality.

4.
I tested functionality of parallel vacuum with different server configuration setting and behavior is as per expected.
shared_buffers, max_parallel_workers, max_parallel_maintenance_workers, vacuum_cost_limit, vacuum_cost_delay, maintenance_work_mem, max_worker_processes

5.
index and table stats of parallel vacuum are matching with normal vacuum.

postgres=# select * from pg_statio_all_tables where relname = 'test';
relid | schemaname | relname | heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit | tidx_blks_read | tidx_blks_hit
-------+------------+---------+----------------+---------------+---------------+--------------+-----------------+----------------+----------------+---------------
16384 | public | test | 399 | 5000 | 3 | 0 | 0 | 0 | 0 | 0
(1 row)

6.
vacuum Progress Reporting is as per expectation.
postgres=# select * from pg_stat_progress_vacuum;
  pid  | datid | datname  | relid |        phase        | heap_blks_total | heap_blks_scanned | heap_blks_vacuumed | index_vacuum_count | max_dead_tuples | num_dead_tuples
-------+-------+----------+-------+---------------------+-----------------+-------------------+--------------------+--------------------+-----------------+-----------------
 44161 | 13577 | postgres | 16384 | cleaning up indexes |           41650 |             41650 |              41650 |                  1 |        11184810 |         1000000
(1 row)

7.
If any worker(or main worker) got error, then immediately all the workers are exiting and action is marked as abort.

8.
I tested parallel vacuum for all the types of indexes and by varying size of indexes, all are working and didn't got any unexpected behavior.

9.
While doing testing, I found that if we delete all the tuples from table, then also size of btree indexes was not reducing.

delete all tuples from table.
before vacuum, total pages in btree index: 8000
after vacuum(normal/parallel), total pages in btree index: 8000
but size of table is reducing after deleting all the tuples.
Can we add a check in vacuum to truncate all the pages of btree indexes if there is no tuple in table.

Please let me know if you have any inputs for more testing.

Thanks and Regards
Mahendra Thalor

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
Sorry for the late reply.

On Fri, 6 Dec 2019 at 14:20, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Dec 5, 2019 at 7:44 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > I think it might be a good idea to change what we expect index AMs to
> > do rather than trying to make anything that they happen to be doing
> > right now work, no matter how crazy. In particular, suppose we say
> > that you CAN'T add data on to the end of IndexBulkDeleteResult any
> > more, and that instead the extra data is passed through a separate
> > parameter. And then you add an estimate method that gives the size of
> > the space provided by that parameter (and if the estimate method isn't
> > defined then the extra parameter is passed as NULL) and document that
> > the data stored there might get flat-copied.
> >
>
> I think this is a good idea and serves the purpose we are trying to
> achieve currently.  However, if there are any IndexAM that is using
> the current way to pass stats with additional information, they would
> need to change even if they don't want to use parallel vacuum
> functionality (say because their indexes are too small or whatever
> other reasons).  I think this is a reasonable trade-off and the
> changes on their end won't be that big.  So, we should do this.
>
> > Now, you've taken the
> > onus off of parallel vacuum to cope with any crazy thing a
> > hypothetical AM might be doing, and instead you've defined the
> > behavior of that hypothetical AM as wrong. If somebody really needs
> > that, it's now their job to modify the index AM machinery further
> > instead of your job to somehow cope.
> >
>
> makes sense.
>
> > > Here, we have a need to reduce the number of workers.  Index Vacuum
> > > has two different phases (index vacuum and index cleanup) which uses
> > > the same parallel-context/DSM but both could have different
> > > requirements for workers.  The second phase (cleanup) would normally
> > > need fewer workers as if the work is done in the first phase, second
> > > wouldn't need it, but we have exceptions like gin indexes where we
> > > need it for the second phase as well because it takes the pass
> > > over-index again even if we have cleaned the index in the first phase.
> > > Now, consider the case where we have 3 btree indexes and 2 gin
> > > indexes, we would need 5 workers for index vacuum phase and 2 workers
> > > for index cleanup phase.  There are other cases too.
> > >
> > > We also considered to have a separate DSM for each phase, but that
> > > appeared to have overhead without much benefit.
> >
> > How about adding an additional argument to ReinitializeParallelDSM()
> > that allows the number of workers to be reduced? That seems like it
> > would be less confusing than what you have now, and would involve
> > modify code in a lot fewer places.
> >
>
> Yeah, we can do that.  We can maintain some information in
> LVParallelState which indicates whether we need to reinitialize the
> DSM before launching workers.  Sawada-San, do you see any problem with
> this idea?

I think the number of workers could be increased in cleanup phase. For
example, if we have 1 brin index and 2 gin indexes then in bulkdelete
phase we need only 1 worker but in cleanup we need 2 workers.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Fri, Dec 13, 2019 at 10:03 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> Sorry for the late reply.
>
> On Fri, 6 Dec 2019 at 14:20, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > > > Here, we have a need to reduce the number of workers.  Index Vacuum
> > > > has two different phases (index vacuum and index cleanup) which uses
> > > > the same parallel-context/DSM but both could have different
> > > > requirements for workers.  The second phase (cleanup) would normally
> > > > need fewer workers as if the work is done in the first phase, second
> > > > wouldn't need it, but we have exceptions like gin indexes where we
> > > > need it for the second phase as well because it takes the pass
> > > > over-index again even if we have cleaned the index in the first phase.
> > > > Now, consider the case where we have 3 btree indexes and 2 gin
> > > > indexes, we would need 5 workers for index vacuum phase and 2 workers
> > > > for index cleanup phase.  There are other cases too.
> > > >
> > > > We also considered to have a separate DSM for each phase, but that
> > > > appeared to have overhead without much benefit.
> > >
> > > How about adding an additional argument to ReinitializeParallelDSM()
> > > that allows the number of workers to be reduced? That seems like it
> > > would be less confusing than what you have now, and would involve
> > > modify code in a lot fewer places.
> > >
> >
> > Yeah, we can do that.  We can maintain some information in
> > LVParallelState which indicates whether we need to reinitialize the
> > DSM before launching workers.  Sawada-San, do you see any problem with
> > this idea?
>
> I think the number of workers could be increased in cleanup phase. For
> example, if we have 1 brin index and 2 gin indexes then in bulkdelete
> phase we need only 1 worker but in cleanup we need 2 workers.
>

I think it shouldn't be more than the number with which we have
created a parallel context, no?  If that is the case, then I think it
should be fine.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Fri, 13 Dec 2019 at 14:19, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Dec 13, 2019 at 10:03 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > Sorry for the late reply.
> >
> > On Fri, 6 Dec 2019 at 14:20, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > >
> > > > > Here, we have a need to reduce the number of workers.  Index Vacuum
> > > > > has two different phases (index vacuum and index cleanup) which uses
> > > > > the same parallel-context/DSM but both could have different
> > > > > requirements for workers.  The second phase (cleanup) would normally
> > > > > need fewer workers as if the work is done in the first phase, second
> > > > > wouldn't need it, but we have exceptions like gin indexes where we
> > > > > need it for the second phase as well because it takes the pass
> > > > > over-index again even if we have cleaned the index in the first phase.
> > > > > Now, consider the case where we have 3 btree indexes and 2 gin
> > > > > indexes, we would need 5 workers for index vacuum phase and 2 workers
> > > > > for index cleanup phase.  There are other cases too.
> > > > >
> > > > > We also considered to have a separate DSM for each phase, but that
> > > > > appeared to have overhead without much benefit.
> > > >
> > > > How about adding an additional argument to ReinitializeParallelDSM()
> > > > that allows the number of workers to be reduced? That seems like it
> > > > would be less confusing than what you have now, and would involve
> > > > modify code in a lot fewer places.
> > > >
> > >
> > > Yeah, we can do that.  We can maintain some information in
> > > LVParallelState which indicates whether we need to reinitialize the
> > > DSM before launching workers.  Sawada-San, do you see any problem with
> > > this idea?
> >
> > I think the number of workers could be increased in cleanup phase. For
> > example, if we have 1 brin index and 2 gin indexes then in bulkdelete
> > phase we need only 1 worker but in cleanup we need 2 workers.
> >
>
> I think it shouldn't be more than the number with which we have
> created a parallel context, no?  If that is the case, then I think it
> should be fine.

Right. I thought that ReinitializeParallelDSM() with an additional
argument would reduce DSM but I understand that it doesn't actually
reduce DSM but just have a variable for the number of workers to
launch, is that right? And we also would need to call
ReinitializeParallelDSM() at the beginning of vacuum index or vacuum
cleanup since we don't know that we will do either index vacuum or
index cleanup, at the end of index vacum.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Fri, Dec 13, 2019 at 11:08 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Fri, 13 Dec 2019 at 14:19, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > > >
> > > > > How about adding an additional argument to ReinitializeParallelDSM()
> > > > > that allows the number of workers to be reduced? That seems like it
> > > > > would be less confusing than what you have now, and would involve
> > > > > modify code in a lot fewer places.
> > > > >
> > > >
> > > > Yeah, we can do that.  We can maintain some information in
> > > > LVParallelState which indicates whether we need to reinitialize the
> > > > DSM before launching workers.  Sawada-San, do you see any problem with
> > > > this idea?
> > >
> > > I think the number of workers could be increased in cleanup phase. For
> > > example, if we have 1 brin index and 2 gin indexes then in bulkdelete
> > > phase we need only 1 worker but in cleanup we need 2 workers.
> > >
> >
> > I think it shouldn't be more than the number with which we have
> > created a parallel context, no?  If that is the case, then I think it
> > should be fine.
>
> Right. I thought that ReinitializeParallelDSM() with an additional
> argument would reduce DSM but I understand that it doesn't actually
> reduce DSM but just have a variable for the number of workers to
> launch, is that right?
>

Yeah, probably, we need to change the nworkers stored in the context
and it should be lesser than the value already stored in that number.

> And we also would need to call
> ReinitializeParallelDSM() at the beginning of vacuum index or vacuum
> cleanup since we don't know that we will do either index vacuum or
> index cleanup, at the end of index vacum.
>

Right.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Fri, 13 Dec 2019 at 15:50, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Dec 13, 2019 at 11:08 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Fri, 13 Dec 2019 at 14:19, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > > > >
> > > > > > How about adding an additional argument to ReinitializeParallelDSM()
> > > > > > that allows the number of workers to be reduced? That seems like it
> > > > > > would be less confusing than what you have now, and would involve
> > > > > > modify code in a lot fewer places.
> > > > > >
> > > > >
> > > > > Yeah, we can do that.  We can maintain some information in
> > > > > LVParallelState which indicates whether we need to reinitialize the
> > > > > DSM before launching workers.  Sawada-San, do you see any problem with
> > > > > this idea?
> > > >
> > > > I think the number of workers could be increased in cleanup phase. For
> > > > example, if we have 1 brin index and 2 gin indexes then in bulkdelete
> > > > phase we need only 1 worker but in cleanup we need 2 workers.
> > > >
> > >
> > > I think it shouldn't be more than the number with which we have
> > > created a parallel context, no?  If that is the case, then I think it
> > > should be fine.
> >
> > Right. I thought that ReinitializeParallelDSM() with an additional
> > argument would reduce DSM but I understand that it doesn't actually
> > reduce DSM but just have a variable for the number of workers to
> > launch, is that right?
> >
>
> Yeah, probably, we need to change the nworkers stored in the context
> and it should be lesser than the value already stored in that number.
>
> > And we also would need to call
> > ReinitializeParallelDSM() at the beginning of vacuum index or vacuum
> > cleanup since we don't know that we will do either index vacuum or
> > index cleanup, at the end of index vacum.
> >
>
> Right.

I've attached the latest version patch set. These patches requires the
gist vacuum patch[1]. The patch incorporated the review comments. In
current version patch only indexes that support parallel vacuum and
whose size is larger than min_parallel_index_scan_size can participate
parallel vacuum. I'm still not unclear to me that using
min_parallel_index_scan_size is the best approach but I agreed to set
a lower bound of relation size. I separated the patch for
PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION  from the main patch and
I'm working on that patch.

Please review it.

[1] https://www.postgresql.org/message-id/CAA4eK1J1RxmXFAHC34S4_BznT76cfbrvqORSk23iBgRAOj1azw%40mail.gmail.com

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh
Date:
On Tue, 17 Dec 2019 at 18:07, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
>
> On Fri, 13 Dec 2019 at 15:50, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Dec 13, 2019 at 11:08 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Fri, 13 Dec 2019 at 14:19, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > > > >
> > > > > > > How about adding an additional argument to ReinitializeParallelDSM()
> > > > > > > that allows the number of workers to be reduced? That seems like it
> > > > > > > would be less confusing than what you have now, and would involve
> > > > > > > modify code in a lot fewer places.
> > > > > > >
> > > > > >
> > > > > > Yeah, we can do that.  We can maintain some information in
> > > > > > LVParallelState which indicates whether we need to reinitialize the
> > > > > > DSM before launching workers.  Sawada-San, do you see any problem with
> > > > > > this idea?
> > > > >
> > > > > I think the number of workers could be increased in cleanup phase. For
> > > > > example, if we have 1 brin index and 2 gin indexes then in bulkdelete
> > > > > phase we need only 1 worker but in cleanup we need 2 workers.
> > > > >
> > > >
> > > > I think it shouldn't be more than the number with which we have
> > > > created a parallel context, no?  If that is the case, then I think it
> > > > should be fine.
> > >
> > > Right. I thought that ReinitializeParallelDSM() with an additional
> > > argument would reduce DSM but I understand that it doesn't actually
> > > reduce DSM but just have a variable for the number of workers to
> > > launch, is that right?
> > >
> >
> > Yeah, probably, we need to change the nworkers stored in the context
> > and it should be lesser than the value already stored in that number.
> >
> > > And we also would need to call
> > > ReinitializeParallelDSM() at the beginning of vacuum index or vacuum
> > > cleanup since we don't know that we will do either index vacuum or
> > > index cleanup, at the end of index vacum.
> > >
> >
> > Right.
>
> I've attached the latest version patch set. These patches requires the
> gist vacuum patch[1]. The patch incorporated the review comments. In
> current version patch only indexes that support parallel vacuum and
> whose size is larger than min_parallel_index_scan_size can participate
> parallel vacuum. I'm still not unclear to me that using
> min_parallel_index_scan_size is the best approach but I agreed to set
> a lower bound of relation size. I separated the patch for
> PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION  from the main patch and
> I'm working on that patch.
>
> Please review it.
>
> [1] https://www.postgresql.org/message-id/CAA4eK1J1RxmXFAHC34S4_BznT76cfbrvqORSk23iBgRAOj1azw%40mail.gmail.com

Thanks for updated patches.  I verified my all reported issues and all are fixed in v36 patch set.

Below are some review comments:
1.
+   /* cap by max_parallel_maintenace_workers */                                                                                                                                  
+   parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);

Here, spell of max_parallel_maintenace_workers is wrong.  (correct: max_parallel_maintenance_workers)

2.
+ * size of stats for each index.  Also, this function   Since currently we don't support parallel vacuum                                                                          
+ * for autovacuum we don't need to care about autovacuum_work_mem

Here, I think, 1st line should be changed because it is not looking correct as grammatically.

Thanks and Regards
Mahendra Thalor
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Tue, Dec 17, 2019 at 6:07 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Fri, 13 Dec 2019 at 15:50, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > > I think it shouldn't be more than the number with which we have
> > > > created a parallel context, no?  If that is the case, then I think it
> > > > should be fine.
> > >
> > > Right. I thought that ReinitializeParallelDSM() with an additional
> > > argument would reduce DSM but I understand that it doesn't actually
> > > reduce DSM but just have a variable for the number of workers to
> > > launch, is that right?
> > >
> >
> > Yeah, probably, we need to change the nworkers stored in the context
> > and it should be lesser than the value already stored in that number.
> >
> > > And we also would need to call
> > > ReinitializeParallelDSM() at the beginning of vacuum index or vacuum
> > > cleanup since we don't know that we will do either index vacuum or
> > > index cleanup, at the end of index vacum.
> > >
> >
> > Right.
>
> I've attached the latest version patch set. These patches requires the
> gist vacuum patch[1]. The patch incorporated the review comments.
>

I was analyzing your changes related to ReinitializeParallelDSM() and
it seems like we might launch more number of workers for the
bulkdelete phase.   While creating a parallel context, we used the
maximum of "workers required for bulkdelete phase" and "workers
required for cleanup", but now if the number of workers required in
bulkdelete phase is lesser than a cleanup phase(as mentioned by you in
one example), then we would launch more workers for bulkdelete phase.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Wed, 18 Dec 2019 at 15:03, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Dec 17, 2019 at 6:07 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Fri, 13 Dec 2019 at 15:50, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > > > I think it shouldn't be more than the number with which we have
> > > > > created a parallel context, no?  If that is the case, then I think it
> > > > > should be fine.
> > > >
> > > > Right. I thought that ReinitializeParallelDSM() with an additional
> > > > argument would reduce DSM but I understand that it doesn't actually
> > > > reduce DSM but just have a variable for the number of workers to
> > > > launch, is that right?
> > > >
> > >
> > > Yeah, probably, we need to change the nworkers stored in the context
> > > and it should be lesser than the value already stored in that number.
> > >
> > > > And we also would need to call
> > > > ReinitializeParallelDSM() at the beginning of vacuum index or vacuum
> > > > cleanup since we don't know that we will do either index vacuum or
> > > > index cleanup, at the end of index vacum.
> > > >
> > >
> > > Right.
> >
> > I've attached the latest version patch set. These patches requires the
> > gist vacuum patch[1]. The patch incorporated the review comments.
> >
>
> I was analyzing your changes related to ReinitializeParallelDSM() and
> it seems like we might launch more number of workers for the
> bulkdelete phase.   While creating a parallel context, we used the
> maximum of "workers required for bulkdelete phase" and "workers
> required for cleanup", but now if the number of workers required in
> bulkdelete phase is lesser than a cleanup phase(as mentioned by you in
> one example), then we would launch more workers for bulkdelete phase.

Good catch. Currently when creating a parallel context the number of
workers passed to CreateParallelContext() is set not only to
pcxt->nworkers but also pcxt->nworkers_to_launch. We would need to
specify the number of workers actually to launch after created the
parallel context or when creating it. Or I think we call
ReinitializeParallelDSM() even the first time running index vacuum.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Wed, 18 Dec 2019 at 03:39, Mahendra Singh <mahi6run@gmail.com> wrote:
>
>
> Thanks for updated patches.  I verified my all reported issues and all are fixed in v36 patch set.
>
> Below are some review comments:
> 1.
> +   /* cap by max_parallel_maintenace_workers */
> +   parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
>
> Here, spell of max_parallel_maintenace_workers is wrong.  (correct: max_parallel_maintenance_workers)
>
> 2.
> + * size of stats for each index.  Also, this function   Since currently we don't support parallel vacuum
> + * for autovacuum we don't need to care about autovacuum_work_mem
>
> Here, I think, 1st line should be changed because it is not looking correct as grammatically.

Thank you for reviewing and testing this patch. I'll incorporate your
comments in the next version patch.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh
Date:
On Tue, 10 Dec 2019 at 00:30, Mahendra Singh <mahi6run@gmail.com> wrote:
>
> On Fri, 6 Dec 2019 at 10:50, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Dec 5, 2019 at 7:44 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > >
> > > I think it might be a good idea to change what we expect index AMs to
> > > do rather than trying to make anything that they happen to be doing
> > > right now work, no matter how crazy. In particular, suppose we say
> > > that you CAN'T add data on to the end of IndexBulkDeleteResult any
> > > more, and that instead the extra data is passed through a separate
> > > parameter. And then you add an estimate method that gives the size of
> > > the space provided by that parameter (and if the estimate method isn't
> > > defined then the extra parameter is passed as NULL) and document that
> > > the data stored there might get flat-copied.
> > >
> >
> > I think this is a good idea and serves the purpose we are trying to
> > achieve currently.  However, if there are any IndexAM that is using
> > the current way to pass stats with additional information, they would
> > need to change even if they don't want to use parallel vacuum
> > functionality (say because their indexes are too small or whatever
> > other reasons).  I think this is a reasonable trade-off and the
> > changes on their end won't be that big.  So, we should do this.
> >
> > > Now, you've taken the
> > > onus off of parallel vacuum to cope with any crazy thing a
> > > hypothetical AM might be doing, and instead you've defined the
> > > behavior of that hypothetical AM as wrong. If somebody really needs
> > > that, it's now their job to modify the index AM machinery further
> > > instead of your job to somehow cope.
> > >
> >
> > makes sense.
> >
> > > > Here, we have a need to reduce the number of workers.  Index Vacuum
> > > > has two different phases (index vacuum and index cleanup) which uses
> > > > the same parallel-context/DSM but both could have different
> > > > requirements for workers.  The second phase (cleanup) would normally
> > > > need fewer workers as if the work is done in the first phase, second
> > > > wouldn't need it, but we have exceptions like gin indexes where we
> > > > need it for the second phase as well because it takes the pass
> > > > over-index again even if we have cleaned the index in the first phase.
> > > > Now, consider the case where we have 3 btree indexes and 2 gin
> > > > indexes, we would need 5 workers for index vacuum phase and 2 workers
> > > > for index cleanup phase.  There are other cases too.
> > > >
> > > > We also considered to have a separate DSM for each phase, but that
> > > > appeared to have overhead without much benefit.
> > >
> > > How about adding an additional argument to ReinitializeParallelDSM()
> > > that allows the number of workers to be reduced? That seems like it
> > > would be less confusing than what you have now, and would involve
> > > modify code in a lot fewer places.
> > >
> >
> > Yeah, we can do that.  We can maintain some information in
> > LVParallelState which indicates whether we need to reinitialize the
> > DSM before launching workers.  Sawada-San, do you see any problem with
> > this idea?
> >
> >
> > > > > Is there any legitimate use case for parallel vacuum in combination
> > > > > with vacuum cost delay?
> > > > >
> > > >
> > > > Yeah, we also initially thought that it is not legitimate to use a
> > > > parallel vacuum with a cost delay.  But to get a wider view, we
> > > > started a separate thread [2] and there we reach to the conclusion
> > > > that we need a solution for throttling [3].
> > >
> > > OK, thanks for the pointer. This doesn't address the other part of my
> > > complaint, though, which is that the whole discussion between you and
> > > Dilip and Sawada-san presumes that you want the delays ought to be
> > > scattered across the workers roughly in proportion to their share of
> > > the I/O, and it seems NOT AT ALL clear that this is actually a
> > > desirable property. You're all assuming that, but none of you has
> > > justified it, and I think the opposite might be true in some cases.
> > >
> >
> > IIUC, your complaint is that in some cases, even if the I/O rate is
> > enough for one worker, we will still launch more workers and throttle
> > them.  The point is we can't know in advance how much I/O is required
> > for each index.  We can try to do that based on index size, but I
> > don't think that will be right because it is possible that for the
> > bigger index, we don't need to dirty the pages and most of the pages
> > are in shared buffers, etc.  The current algorithm won't use more I/O
> > than required and it will be good for cases where one or some of the
> > indexes are doing more I/O as compared to others and it will also work
> > equally good even when the indexes have a similar amount of work.  I
> > think we could do better if we can predict how much I/O each index
> > requires before actually scanning the index.
> >
> > I agree with the other points (add a FAST option for parallel vacuum
> > and document that parallel vacuum is still potentially throttled ...)
> > you made in a separate email.
> >
> >
> > > You're adding extra complexity for something that isn't a clear
> > > improvement.
> > >
> > > > Your understanding is correct.  How about if we modify it to something
> > > > like: "Note that parallel workers are alive only during index vacuum
> > > > or index cleanup but the leader process neither exits from the
> > > > parallel mode nor destroys the parallel context until the entire
> > > > parallel operation is finished." OR something like "The leader backend
> > > > holds the parallel context till the index vacuum and cleanup is
> > > > finished.  Both index vacuum and cleanup separately perform the work
> > > > with parallel workers."
> > >
> > > How about if you just delete it? You don't need a comment explaining
> > > that this caller of CreateParallelContext() does something which
> > > *every* caller of CreateParallelContext() must do. If you didn't do
> > > that, you'd fail assertions and everything would break, so *of course*
> > > you are doing it.
> > >
> >
> > Fair enough, we can just remove this part of the comment.
> >
>
> Hi All,
> Below is the brief about testing of v35 patch set.
>
> 1.
> All the test cases are passing on the top of v35 patch set (make check world and all contrib test cases)
>
> 2.
> By enabling PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION, "make check world" is passing.
>
> 3.
> After v35 patch, vacuum.sql regression test is taking too much time due to large number of inserts so by reducing
numberof tuples, we can reduce that time.
 
> +INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM generate_series(1,100000) i;
>
> here, instead of 100000, we can make 1000 to reduce time of this test case because we only want to test code and
functionality.

As we added check of min_parallel_index_scan_size in v36 patch set to
decide parallel vacuum, 1000 tuples are not enough to do parallel
vacuum. I can see that we are not launching any workers in vacuum.sql
test case and hence, code coverage also decreased. I am not sure that
how to fix this.

Thanks and Regards
Mahendra Thalor
EnterpriseDB: http://www.enterprisedb.com

>
> 4.
> I tested functionality of parallel vacuum with different server configuration setting and behavior is as per
expected.
> shared_buffers, max_parallel_workers, max_parallel_maintenance_workers, vacuum_cost_limit, vacuum_cost_delay,
maintenance_work_mem,max_worker_processes
 
>
> 5.
> index and table stats of parallel vacuum are matching with normal vacuum.
>
> postgres=# select * from pg_statio_all_tables where relname = 'test';
> relid | schemaname | relname | heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit | toast_blks_read |
toast_blks_hit| tidx_blks_read | tidx_blks_hit
 
>
-------+------------+---------+----------------+---------------+---------------+--------------+-----------------+----------------+----------------+---------------
> 16384 | public | test | 399 | 5000 | 3 | 0 | 0 | 0 | 0 | 0
> (1 row)
>
> 6.
> vacuum Progress Reporting is as per expectation.
> postgres=# select * from pg_stat_progress_vacuum;
>   pid  | datid | datname  | relid |        phase        | heap_blks_total | heap_blks_scanned | heap_blks_vacuumed |
index_vacuum_count| max_dead_tuples | num_dead_tuples
 
>
-------+-------+----------+-------+---------------------+-----------------+-------------------+--------------------+--------------------+-----------------+-----------------
>  44161 | 13577 | postgres | 16384 | cleaning up indexes |           41650 |             41650 |              41650 |
               1 |        11184810 |         1000000
 
> (1 row)
>
> 7.
> If any worker(or main worker) got error, then immediately all the workers are exiting and action is marked as abort.
>
> 8.
> I tested parallel vacuum for all the types of indexes and by varying size of indexes, all are working and didn't got
anyunexpected behavior.
 
>
> 9.
> While doing testing, I found that if we delete all the tuples from table, then also size of btree indexes was not
reducing.
>
> delete all tuples from table.
> before vacuum, total pages in btree index: 8000
> after vacuum(normal/parallel), total pages in btree index: 8000
> but size of table is reducing after deleting all the tuples.
> Can we add a check in vacuum to truncate all the pages of btree indexes if there is no tuple in table.
>
> Please let me know if you have any inputs for more testing.
>
> Thanks and Regards
> Mahendra Thalor
> EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Wed, Dec 18, 2019 at 11:46 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Wed, 18 Dec 2019 at 15:03, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > I was analyzing your changes related to ReinitializeParallelDSM() and
> > it seems like we might launch more number of workers for the
> > bulkdelete phase.   While creating a parallel context, we used the
> > maximum of "workers required for bulkdelete phase" and "workers
> > required for cleanup", but now if the number of workers required in
> > bulkdelete phase is lesser than a cleanup phase(as mentioned by you in
> > one example), then we would launch more workers for bulkdelete phase.
>
> Good catch. Currently when creating a parallel context the number of
> workers passed to CreateParallelContext() is set not only to
> pcxt->nworkers but also pcxt->nworkers_to_launch. We would need to
> specify the number of workers actually to launch after created the
> parallel context or when creating it. Or I think we call
> ReinitializeParallelDSM() even the first time running index vacuum.
>

How about just having ReinitializeParallelWorkers which can be called
only via vacuum even for the first time before the launch of workers
as of now?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
[please trim extra text before responding]

On Wed, Dec 18, 2019 at 12:01 PM Mahendra Singh <mahi6run@gmail.com> wrote:
>
> On Tue, 10 Dec 2019 at 00:30, Mahendra Singh <mahi6run@gmail.com> wrote:
> >
> >
> > 3.
> > After v35 patch, vacuum.sql regression test is taking too much time due to large number of inserts so by reducing
numberof tuples, we can reduce that time.
 
> > +INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM generate_series(1,100000) i;
> >
> > here, instead of 100000, we can make 1000 to reduce time of this test case because we only want to test code and
functionality.
>
> As we added check of min_parallel_index_scan_size in v36 patch set to
> decide parallel vacuum, 1000 tuples are not enough to do parallel
> vacuum. I can see that we are not launching any workers in vacuum.sql
> test case and hence, code coverage also decreased. I am not sure that
> how to fix this.
>

Try by setting min_parallel_index_scan_size to 0 in test case.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Wed, Dec 18, 2019 at 12:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Dec 18, 2019 at 11:46 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Wed, 18 Dec 2019 at 15:03, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > I was analyzing your changes related to ReinitializeParallelDSM() and
> > > it seems like we might launch more number of workers for the
> > > bulkdelete phase.   While creating a parallel context, we used the
> > > maximum of "workers required for bulkdelete phase" and "workers
> > > required for cleanup", but now if the number of workers required in
> > > bulkdelete phase is lesser than a cleanup phase(as mentioned by you in
> > > one example), then we would launch more workers for bulkdelete phase.
> >
> > Good catch. Currently when creating a parallel context the number of
> > workers passed to CreateParallelContext() is set not only to
> > pcxt->nworkers but also pcxt->nworkers_to_launch. We would need to
> > specify the number of workers actually to launch after created the
> > parallel context or when creating it. Or I think we call
> > ReinitializeParallelDSM() even the first time running index vacuum.
> >
>
> How about just having ReinitializeParallelWorkers which can be called
> only via vacuum even for the first time before the launch of workers
> as of now?
>

See in the attached what I have in mind.  Few other comments:

1.
+ shared->disable_delay = (params->options & VACOPT_FAST);

This should be part of the third patch.

2.
+lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
+ LVRelStats *vacrelstats, LVParallelState *lps,
+ int nindexes)
{
..
..
+ /* Cap by the worker we computed at the beginning of parallel lazy vacuum */
+ nworkers = Min(nworkers, lps->pcxt->nworkers);
..
}

This should be Assert.  In no case, the computed workers can be more
than what we have in context.

3.
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0))
+ nindexes_parallel_cleanup++;

I think the second condition should be VACUUM_OPTION_PARALLEL_COND_CLEANUP.

I have fixed the above comments and some given by me earlier [1] in
the attached patch.  The attached patch is a diff on top of
v36-0002-Add-parallel-option-to-VACUUM-command.

Few other comments which I have not fixed:

4.
+ if (Irel[i]->rd_indam->amusemaintenanceworkmem)
+ nindexes_mwm++;
+
+ /* Skip indexes that don't participate parallel index vacuum */
+ if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
+ RelationGetNumberOfBlocks(Irel[i]) < min_parallel_index_scan_size)
+ continue;

Won't we need to worry about the number of indexes that uses
maintenance_work_mem only for indexes that can participate in a
parallel vacuum? If so, the above checks need to be reversed.

5.
/*
+ * Remember indexes that can participate parallel index vacuum and use
+ * it for index statistics initialization on DSM because the index
+ * size can get bigger during vacuum.
+ */
+ can_parallel_vacuum[i] = true;

I am not able to understand the second part of the comment ("because
the index size can get bigger during vacuum.").  What is its
relevance?

6.
+/*
+ * Vacuum or cleanup indexes that can be processed by only the leader process
+ * because these indexes don't support parallel operation at that phase.
+ * Therefore this function must be called by the leader process.
+ */
+static void
+vacuum_indexes_leader(Relation *Irel, int nindexes,
IndexBulkDeleteResult **stats,
+   LVRelStats *vacrelstats, LVParallelState *lps)
{
..

Why you have changed the order of nindexes parameter?  I think in the
previous patch, it was the last parameter and that seems to be better
place for it.  Also, I think after the latest modifications, you can
remove the second sentence in the above comment ("Therefore this
function must be called by the leader process.).

7.
+ for (i = 0; i < nindexes; i++)
+ {
+ bool leader_only = (get_indstats(lps->lvshared, i) == NULL ||
+    skip_parallel_vacuum_index(Irel[i], lps->lvshared));
+
+ /* Skip the indexes that can be processed by parallel workers */
+ if (!leader_only)
+ continue;

It is better to name this parameter as skip_index or something like that.


[1] - https://www.postgresql.org/message-id/CAA4eK1%2BKBAt1JS%2BasDd7K9C10OtBiyuUC75y8LR6QVnD2wrsMw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Prabhat Sahu
Date:
Hi all,

While testing on v36 patch with gist index, I came across below segmentation fault.

-- PG Head+ v36_patch
create table tab1(c1 int, c2 text PRIMARY KEY, c3 bool, c4 timestamp without time zone, c5 timestamp with time zone, p point);
create index gist_idx1 on tab1 using gist(p);
create index gist_idx2 on tab1 using gist(p);
create index gist_idx3 on tab1 using gist(p);
create index gist_idx4 on tab1 using gist(p);
create index gist_idx5 on tab1 using gist(p);

-- Cancel the insert statement in middle:
postgres=# insert into tab1 (select x, x||'_c2', 'T', current_date-x/100, current_date-x/100,point (x,x) from generate_series(1,1000000) x);
^CCancel request sent
ERROR:  canceling statement due to user request

-- Segmentation fault during VACUUM(PARALLEL):
postgres=# vacuum(parallel 10) tab1 ;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

-- Below is the stack trace:
[centos@parallel-vacuum-testing bin]$ gdb -q -c data/core.14650  postgres
Reading symbols from /home/centos/BLP_Vacuum/postgresql/inst/bin/postgres...done.
[New LWP 14650]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: centos postgres [local] VACUUM                    '.
Program terminated with signal 11, Segmentation fault.
#0  0x000000000075e713 in intset_num_entries (intset=0x1f62) at integerset.c:353
353 return intset->num_entries;
Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_7.2.x86_64 libcom_err-1.42.9-16.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0  0x000000000075e713 in intset_num_entries (intset=0x1f62) at integerset.c:353
#1  0x00000000004cbe0f in gistvacuum_delete_empty_pages (info=0x7fff32f8eba0, stats=0x7f2923b3f4d8) at gistvacuum.c:478
#2  0x00000000004cb353 in gistvacuumcleanup (info=0x7fff32f8eba0, stats=0x7f2923b3f4d8) at gistvacuum.c:124
#3  0x000000000050dcca in index_vacuum_cleanup (info=0x7fff32f8eba0, stats=0x7f2923b3f4d8) at indexam.c:711
#4  0x00000000005079ba in lazy_cleanup_index (indrel=0x7f292e149560, stats=0x2db5e40, reltuples=0, estimated_count=false) at vacuumlazy.c:2380
#5  0x00000000005074f0 in vacuum_one_index (indrel=0x7f292e149560, stats=0x2db5e40, lvshared=0x7f2923b3f460, shared_indstats=0x7f2923b3f4d0,
    dead_tuples=0x7f2922fbe2c0) at vacuumlazy.c:2196
#6  0x0000000000507428 in vacuum_indexes_leader (Irel=0x2db5de0, nindexes=6, stats=0x2db5e38, vacrelstats=0x2db5cb0, lps=0x2db5e90) at vacuumlazy.c:2155
#7  0x0000000000507126 in lazy_parallel_vacuum_indexes (Irel=0x2db5de0, stats=0x2db5e38, vacrelstats=0x2db5cb0, lps=0x2db5e90, nindexes=6)
    at vacuumlazy.c:2045
#8  0x0000000000507770 in lazy_cleanup_indexes (Irel=0x2db5de0, stats=0x2db5e38, vacrelstats=0x2db5cb0, lps=0x2db5e90, nindexes=6) at vacuumlazy.c:2300
#9  0x0000000000506076 in lazy_scan_heap (onerel=0x7f292e1473b8, params=0x7fff32f8f3e0, vacrelstats=0x2db5cb0, Irel=0x2db5de0, nindexes=6, aggressive=false)
    at vacuumlazy.c:1675
#10 0x0000000000504228 in heap_vacuum_rel (onerel=0x7f292e1473b8, params=0x7fff32f8f3e0, bstrategy=0x2deb3a0) at vacuumlazy.c:475
#11 0x00000000006ea059 in table_relation_vacuum (rel=0x7f292e1473b8, params=0x7fff32f8f3e0, bstrategy=0x2deb3a0)
    at ../../../src/include/access/tableam.h:1432
#12 0x00000000006ecb74 in vacuum_rel (relid=16384, relation=0x2cf5cf8, params=0x7fff32f8f3e0) at vacuum.c:1885
#13 0x00000000006eac8d in vacuum (relations=0x2deb548, params=0x7fff32f8f3e0, bstrategy=0x2deb3a0, isTopLevel=true) at vacuum.c:440
#14 0x00000000006ea776 in ExecVacuum (pstate=0x2deaf90, vacstmt=0x2cf5de0, isTopLevel=true) at vacuum.c:241
#15 0x000000000091da3d in standard_ProcessUtility (pstmt=0x2cf5ea8, queryString=0x2cf51a0 "vacuum(parallel 10) tab1 ;", context=PROCESS_UTILITY_TOPLEVEL,
    params=0x0, queryEnv=0x0, dest=0x2cf6188, completionTag=0x7fff32f8f840 "") at utility.c:665
#16 0x000000000091d270 in ProcessUtility (pstmt=0x2cf5ea8, queryString=0x2cf51a0 "vacuum(parallel 10) tab1 ;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
    queryEnv=0x0, dest=0x2cf6188, completionTag=0x7fff32f8f840 "") at utility.c:359
#17 0x000000000091c187 in PortalRunUtility (portal=0x2d5c530, pstmt=0x2cf5ea8, isTopLevel=true, setHoldSnapshot=false, dest=0x2cf6188,
    completionTag=0x7fff32f8f840 "") at pquery.c:1175
#18 0x000000000091c39e in PortalRunMulti (portal=0x2d5c530, isTopLevel=true, setHoldSnapshot=false, dest=0x2cf6188, altdest=0x2cf6188,
    completionTag=0x7fff32f8f840 "") at pquery.c:1321
#19 0x000000000091b8c8 in PortalRun (portal=0x2d5c530, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x2cf6188, altdest=0x2cf6188,
    completionTag=0x7fff32f8f840 "") at pquery.c:796
#20 0x00000000009156d4 in exec_simple_query (query_string=0x2cf51a0 "vacuum(parallel 10) tab1 ;") at postgres.c:1227
#21 0x0000000000919a1c in PostgresMain (argc=1, argv=0x2d1f608, dbname=0x2d1f520 "postgres", username=0x2d1f500 "centos") at postgres.c:4288
#22 0x000000000086de39 in BackendRun (port=0x2d174e0) at postmaster.c:4498
#23 0x000000000086d617 in BackendStartup (port=0x2d174e0) at postmaster.c:4189
#24 0x0000000000869992 in ServerLoop () at postmaster.c:1727
#25 0x0000000000869248 in PostmasterMain (argc=3, argv=0x2cefd70) at postmaster.c:1400
#26 0x0000000000778593 in main (argc=3, argv=0x2cefd70) at main.c:210



On Wed, Dec 18, 2019 at 3:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Dec 18, 2019 at 12:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Dec 18, 2019 at 11:46 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Wed, 18 Dec 2019 at 15:03, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > I was analyzing your changes related to ReinitializeParallelDSM() and
> > > it seems like we might launch more number of workers for the
> > > bulkdelete phase.   While creating a parallel context, we used the
> > > maximum of "workers required for bulkdelete phase" and "workers
> > > required for cleanup", but now if the number of workers required in
> > > bulkdelete phase is lesser than a cleanup phase(as mentioned by you in
> > > one example), then we would launch more workers for bulkdelete phase.
> >
> > Good catch. Currently when creating a parallel context the number of
> > workers passed to CreateParallelContext() is set not only to
> > pcxt->nworkers but also pcxt->nworkers_to_launch. We would need to
> > specify the number of workers actually to launch after created the
> > parallel context or when creating it. Or I think we call
> > ReinitializeParallelDSM() even the first time running index vacuum.
> >
>
> How about just having ReinitializeParallelWorkers which can be called
> only via vacuum even for the first time before the launch of workers
> as of now?
>

See in the attached what I have in mind.  Few other comments:

1.
+ shared->disable_delay = (params->options & VACOPT_FAST);

This should be part of the third patch.

2.
+lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
+ LVRelStats *vacrelstats, LVParallelState *lps,
+ int nindexes)
{
..
..
+ /* Cap by the worker we computed at the beginning of parallel lazy vacuum */
+ nworkers = Min(nworkers, lps->pcxt->nworkers);
..
}

This should be Assert.  In no case, the computed workers can be more
than what we have in context.

3.
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0))
+ nindexes_parallel_cleanup++;

I think the second condition should be VACUUM_OPTION_PARALLEL_COND_CLEANUP.

I have fixed the above comments and some given by me earlier [1] in
the attached patch.  The attached patch is a diff on top of
v36-0002-Add-parallel-option-to-VACUUM-command.

Few other comments which I have not fixed:

4.
+ if (Irel[i]->rd_indam->amusemaintenanceworkmem)
+ nindexes_mwm++;
+
+ /* Skip indexes that don't participate parallel index vacuum */
+ if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
+ RelationGetNumberOfBlocks(Irel[i]) < min_parallel_index_scan_size)
+ continue;

Won't we need to worry about the number of indexes that uses
maintenance_work_mem only for indexes that can participate in a
parallel vacuum? If so, the above checks need to be reversed.

5.
/*
+ * Remember indexes that can participate parallel index vacuum and use
+ * it for index statistics initialization on DSM because the index
+ * size can get bigger during vacuum.
+ */
+ can_parallel_vacuum[i] = true;

I am not able to understand the second part of the comment ("because
the index size can get bigger during vacuum.").  What is its
relevance?

6.
+/*
+ * Vacuum or cleanup indexes that can be processed by only the leader process
+ * because these indexes don't support parallel operation at that phase.
+ * Therefore this function must be called by the leader process.
+ */
+static void
+vacuum_indexes_leader(Relation *Irel, int nindexes,
IndexBulkDeleteResult **stats,
+   LVRelStats *vacrelstats, LVParallelState *lps)
{
..

Why you have changed the order of nindexes parameter?  I think in the
previous patch, it was the last parameter and that seems to be better
place for it.  Also, I think after the latest modifications, you can
remove the second sentence in the above comment ("Therefore this
function must be called by the leader process.).

7.
+ for (i = 0; i < nindexes; i++)
+ {
+ bool leader_only = (get_indstats(lps->lvshared, i) == NULL ||
+    skip_parallel_vacuum_index(Irel[i], lps->lvshared));
+
+ /* Skip the indexes that can be processed by parallel workers */
+ if (!leader_only)
+ continue;

It is better to name this parameter as skip_index or something like that.


[1] - https://www.postgresql.org/message-id/CAA4eK1%2BKBAt1JS%2BasDd7K9C10OtBiyuUC75y8LR6QVnD2wrsMw%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


--

With Regards,

Prabhat Kumar Sahu
Skype ID: prabhat.sahu1984
EnterpriseDB Software India Pvt. Ltd.

The Postgres Database Company

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Wed, Dec 18, 2019 at 3:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Few other comments which I have not fixed:
>

+    /* interface function to support parallel vacuum */
+    amestimateparallelvacuum_function amestimateparallelvacuum; /*
can be NULL */
 } IndexAmRoutine;

One more thing, why have you removed the estimate function for API
patch?  It seems to me Robert has given a different suggestion [1] to
deal with it.  I think he suggests to add a new member like void
*private_data to IndexBulkDeleteResult and then provide an estimate
function.  See his email [1] for detailed explanation.  Did I
misunderstood it or you have handled it differently?  Can you please
share your thoughts on this?


[1] - https://www.postgresql.org/message-id/CA%2BTgmobjtHdLfQhmzqBNt7VEsz%2B5w3P0yy0-EsoT05yAJViBSQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Wed, Dec 18, 2019 at 6:01 PM Prabhat Sahu <prabhat.sahu@enterprisedb.com> wrote:
Hi all,

While testing on v36 patch with gist index, I came across below segmentation fault.


It seems you forgot to apply the Gist index patch as mentioned by Masahiko-San.  You need to first apply the patch at https://www.postgresql.org/message-id/CAA4eK1J1RxmXFAHC34S4_BznT76cfbrvqORSk23iBgRAOj1azw%40mail.gmail.com and then apply other v-36 patches.  If you have already done that, then we need to investigate.  Kindly confirm.


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

From
Prabhat Sahu
Date:


On Wed, Dec 18, 2019 at 6:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Dec 18, 2019 at 6:01 PM Prabhat Sahu <prabhat.sahu@enterprisedb.com> wrote:
Hi all,

While testing on v36 patch with gist index, I came across below segmentation fault.


It seems you forgot to apply the Gist index patch as mentioned by Masahiko-San.  You need to first apply the patch at https://www.postgresql.org/message-id/CAA4eK1J1RxmXFAHC34S4_BznT76cfbrvqORSk23iBgRAOj1azw%40mail.gmail.com and then apply other v-36 patches.  If you have already done that, then we need to investigate.  Kindly confirm.

Yes Amit, Thanks for the suggestion. I have forgotten to add the v4 patch.
I have retested the same scenario, now the issue is not reproducible and it is working fine.
--

With Regards,

Prabhat Kumar Sahu
Skype ID: prabhat.sahu1984
EnterpriseDB Software India Pvt. Ltd.

The Postgres Database Company

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Wed, Dec 18, 2019 at 6:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Dec 18, 2019 at 3:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Few other comments which I have not fixed:
> >
>
> +    /* interface function to support parallel vacuum */
> +    amestimateparallelvacuum_function amestimateparallelvacuum; /*
> can be NULL */
>  } IndexAmRoutine;
>
> One more thing, why have you removed the estimate function for API
> patch?
>

Again thinking about this, it seems to me what you have done here is
probably the right direction because whatever else we will do we need
to have some untested code or we need to write/enhance some IndexAM to
test this.  The point is that we don't have any IndexAM in the core
(after working around Gist index) which has this requirement and we
have not even heard from anyone of such usage, so there is a good
chance that whatever we do might not be sufficient for the IndexAM
that have such usage.

Now, we are already providing an option that one can set
VACUUM_OPTION_NO_PARALLEL to indicate that the IndexAM can't
participate in a parallel vacuum.  So, I feel if there is any IndexAM
which would like to pass more data along with IndexBulkDeleteResult,
they can use that option.  It won't be very difficult to enhance or
provide the new APIs to support a parallel vacuum if we come across
such a usage.  I think we should just modify the comments atop
VACUUM_OPTION_NO_PARALLEL to mention this.  I think this should be
good enough for the first version of parallel vacuum considering we
are able to support a parallel vacuum for all in-core indexes.

Thoughts?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Thu, Dec 19, 2019 at 8:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Dec 18, 2019 at 6:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Dec 18, 2019 at 3:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > Few other comments which I have not fixed:
> > >
> >
> > +    /* interface function to support parallel vacuum */
> > +    amestimateparallelvacuum_function amestimateparallelvacuum; /*
> > can be NULL */
> >  } IndexAmRoutine;
> >
> > One more thing, why have you removed the estimate function for API
> > patch?
> >
>
> Again thinking about this, it seems to me what you have done here is
> probably the right direction because whatever else we will do we need
> to have some untested code or we need to write/enhance some IndexAM to
> test this.  The point is that we don't have any IndexAM in the core
> (after working around Gist index) which has this requirement and we
> have not even heard from anyone of such usage, so there is a good
> chance that whatever we do might not be sufficient for the IndexAM
> that have such usage.
>
> Now, we are already providing an option that one can set
> VACUUM_OPTION_NO_PARALLEL to indicate that the IndexAM can't
> participate in a parallel vacuum.  So, I feel if there is any IndexAM
> which would like to pass more data along with IndexBulkDeleteResult,
> they can use that option.  It won't be very difficult to enhance or
> provide the new APIs to support a parallel vacuum if we come across
> such a usage.  I think we should just modify the comments atop
> VACUUM_OPTION_NO_PARALLEL to mention this.  I think this should be
> good enough for the first version of parallel vacuum considering we
> are able to support a parallel vacuum for all in-core indexes.
>
> Thoughts?
+1

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Thu, 19 Dec 2019 at 11:47, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Dec 18, 2019 at 6:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Dec 18, 2019 at 3:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > Few other comments which I have not fixed:
> > >
> >
> > +    /* interface function to support parallel vacuum */
> > +    amestimateparallelvacuum_function amestimateparallelvacuum; /*
> > can be NULL */
> >  } IndexAmRoutine;
> >
> > One more thing, why have you removed the estimate function for API
> > patch?
> >
>
> Again thinking about this, it seems to me what you have done here is
> probably the right direction because whatever else we will do we need
> to have some untested code or we need to write/enhance some IndexAM to
> test this.  The point is that we don't have any IndexAM in the core
> (after working around Gist index) which has this requirement and we
> have not even heard from anyone of such usage, so there is a good
> chance that whatever we do might not be sufficient for the IndexAM
> that have such usage.
>
> Now, we are already providing an option that one can set
> VACUUM_OPTION_NO_PARALLEL to indicate that the IndexAM can't
> participate in a parallel vacuum.  So, I feel if there is any IndexAM
> which would like to pass more data along with IndexBulkDeleteResult,
> they can use that option.  It won't be very difficult to enhance or
> provide the new APIs to support a parallel vacuum if we come across
> such a usage.

Yeah that's exactly what I was thinking. I was about to send such
email. The idea is good but I thought we can exclude this feature from
the first version patch because we still don't have index AMs that
uses that callback in core after gist index patch gets committed. That
is, an index AM that does vacuum like the current gist indexes should
set VACUUM_OPTION_NO_PARALLEL and we can discuss that again when we
got real voice from index AM developers.

> I think we should just modify the comments atop
> VACUUM_OPTION_NO_PARALLEL to mention this.  I think this should be
> good enough for the first version of parallel vacuum considering we
> are able to support a parallel vacuum for all in-core indexes.

I added some comments about that in v36 patch but I slightly modified it.

I'll submit an updated version patch soon.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Wed, 18 Dec 2019 at 19:06, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Dec 18, 2019 at 12:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Dec 18, 2019 at 11:46 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Wed, 18 Dec 2019 at 15:03, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > I was analyzing your changes related to ReinitializeParallelDSM() and
> > > > it seems like we might launch more number of workers for the
> > > > bulkdelete phase.   While creating a parallel context, we used the
> > > > maximum of "workers required for bulkdelete phase" and "workers
> > > > required for cleanup", but now if the number of workers required in
> > > > bulkdelete phase is lesser than a cleanup phase(as mentioned by you in
> > > > one example), then we would launch more workers for bulkdelete phase.
> > >
> > > Good catch. Currently when creating a parallel context the number of
> > > workers passed to CreateParallelContext() is set not only to
> > > pcxt->nworkers but also pcxt->nworkers_to_launch. We would need to
> > > specify the number of workers actually to launch after created the
> > > parallel context or when creating it. Or I think we call
> > > ReinitializeParallelDSM() even the first time running index vacuum.
> > >
> >
> > How about just having ReinitializeParallelWorkers which can be called
> > only via vacuum even for the first time before the launch of workers
> > as of now?
> >
>
> See in the attached what I have in mind.  Few other comments:
>
> 1.
> + shared->disable_delay = (params->options & VACOPT_FAST);
>
> This should be part of the third patch.
>
> 2.
> +lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
> + LVRelStats *vacrelstats, LVParallelState *lps,
> + int nindexes)
> {
> ..
> ..
> + /* Cap by the worker we computed at the beginning of parallel lazy vacuum */
> + nworkers = Min(nworkers, lps->pcxt->nworkers);
> ..
> }
>
> This should be Assert.  In no case, the computed workers can be more
> than what we have in context.
>
> 3.
> + if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
> + ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0))
> + nindexes_parallel_cleanup++;
>
> I think the second condition should be VACUUM_OPTION_PARALLEL_COND_CLEANUP.
>
> I have fixed the above comments and some given by me earlier [1] in
> the attached patch.  The attached patch is a diff on top of
> v36-0002-Add-parallel-option-to-VACUUM-command.

Thank you!

- /* Cap by the worker we computed at the beginning of parallel lazy vacuum */
- nworkers = Min(nworkers, lps->pcxt->nworkers);
+ /*
+ * The number of workers required for parallel vacuum phase must be less
+ * than the number of workers with which parallel context is initialized.
+ */
+ Assert(lps->pcxt->nworkers >= nworkers);

Regarding the above change in your patch I think we need to cap the
number of workers by lps->pcxt->nworkers because the computation of
the number of indexes based on lps->nindexes_paralle_XXX can be larger
than the number determined when creating the parallel context, for
example, when max_parallel_maintenance_workers is smaller than the
number of indexes that can be vacuumed in parallel at bulkdelete
phase.

>
> Few other comments which I have not fixed:
>
> 4.
> + if (Irel[i]->rd_indam->amusemaintenanceworkmem)
> + nindexes_mwm++;
> +
> + /* Skip indexes that don't participate parallel index vacuum */
> + if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
> + RelationGetNumberOfBlocks(Irel[i]) < min_parallel_index_scan_size)
> + continue;
>
> Won't we need to worry about the number of indexes that uses
> maintenance_work_mem only for indexes that can participate in a
> parallel vacuum? If so, the above checks need to be reversed.

You're right. Fixed.

>
> 5.
> /*
> + * Remember indexes that can participate parallel index vacuum and use
> + * it for index statistics initialization on DSM because the index
> + * size can get bigger during vacuum.
> + */
> + can_parallel_vacuum[i] = true;
>
> I am not able to understand the second part of the comment ("because
> the index size can get bigger during vacuum.").  What is its
> relevance?

I meant that the indexes can be begger even during vacuum. So we need
to check the size of indexes and determine participations of parallel
index vacuum at one place.

>
> 6.
> +/*
> + * Vacuum or cleanup indexes that can be processed by only the leader process
> + * because these indexes don't support parallel operation at that phase.
> + * Therefore this function must be called by the leader process.
> + */
> +static void
> +vacuum_indexes_leader(Relation *Irel, int nindexes,
> IndexBulkDeleteResult **stats,
> +   LVRelStats *vacrelstats, LVParallelState *lps)
> {
> ..
>
> Why you have changed the order of nindexes parameter?  I think in the
> previous patch, it was the last parameter and that seems to be better
> place for it.

Since some existing codes place nindexes right after *Irel I thought
it's more understandable but I'm also fine with the previous order.

> Also, I think after the latest modifications, you can
> remove the second sentence in the above comment ("Therefore this
> function must be called by the leader process.).

Fixed.

>
> 7.
> + for (i = 0; i < nindexes; i++)
> + {
> + bool leader_only = (get_indstats(lps->lvshared, i) == NULL ||
> +    skip_parallel_vacuum_index(Irel[i], lps->lvshared));
> +
> + /* Skip the indexes that can be processed by parallel workers */
> + if (!leader_only)
> + continue;
>
> It is better to name this parameter as skip_index or something like that.

Fixed.

Attached the updated version patch. This version patch incorporates
the above comments and the comments from Mahendra. I also fixed one
bug around determining the indexes that are vacuumed in parallel based
on their option and size. Please review it.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh
Date:
On Wed, 18 Dec 2019 at 12:07, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> [please trim extra text before responding]
>
> On Wed, Dec 18, 2019 at 12:01 PM Mahendra Singh <mahi6run@gmail.com> wrote:
> >
> > On Tue, 10 Dec 2019 at 00:30, Mahendra Singh <mahi6run@gmail.com> wrote:
> > >
> > >
> > > 3.
> > > After v35 patch, vacuum.sql regression test is taking too much time due to large number of inserts so by reducing
numberof tuples, we can reduce that time.
 
> > > +INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM generate_series(1,100000) i;
> > >
> > > here, instead of 100000, we can make 1000 to reduce time of this test case because we only want to test code and
functionality.
> >
> > As we added check of min_parallel_index_scan_size in v36 patch set to
> > decide parallel vacuum, 1000 tuples are not enough to do parallel
> > vacuum. I can see that we are not launching any workers in vacuum.sql
> > test case and hence, code coverage also decreased. I am not sure that
> > how to fix this.
> >
>
> Try by setting min_parallel_index_scan_size to 0 in test case.

Thanks Amit for the fix.

Yes, we can add "set min_parallel_index_scan_size = 0;" in vacuum.sql
test case. I tested by setting min_parallel_index_scan_size=0 and it
is working fine.

@Masahiko san, please add above line in vacuum.sql test case.

Thanks and Regards
Mahendra Thalor
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Thu, Dec 19, 2019 at 11:11 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Wed, 18 Dec 2019 at 19:06, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
>
> - /* Cap by the worker we computed at the beginning of parallel lazy vacuum */
> - nworkers = Min(nworkers, lps->pcxt->nworkers);
> + /*
> + * The number of workers required for parallel vacuum phase must be less
> + * than the number of workers with which parallel context is initialized.
> + */
> + Assert(lps->pcxt->nworkers >= nworkers);
>
> Regarding the above change in your patch I think we need to cap the
> number of workers by lps->pcxt->nworkers because the computation of
> the number of indexes based on lps->nindexes_paralle_XXX can be larger
> than the number determined when creating the parallel context, for
> example, when max_parallel_maintenance_workers is smaller than the
> number of indexes that can be vacuumed in parallel at bulkdelete
> phase.
>

oh, right, but then probably, you can write a comment as this is not so obvious.

> >
> > Few other comments which I have not fixed:
> >
> > 4.
> > + if (Irel[i]->rd_indam->amusemaintenanceworkmem)
> > + nindexes_mwm++;
> > +
> > + /* Skip indexes that don't participate parallel index vacuum */
> > + if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
> > + RelationGetNumberOfBlocks(Irel[i]) < min_parallel_index_scan_size)
> > + continue;
> >
> > Won't we need to worry about the number of indexes that uses
> > maintenance_work_mem only for indexes that can participate in a
> > parallel vacuum? If so, the above checks need to be reversed.
>
> You're right. Fixed.
>
> >
> > 5.
> > /*
> > + * Remember indexes that can participate parallel index vacuum and use
> > + * it for index statistics initialization on DSM because the index
> > + * size can get bigger during vacuum.
> > + */
> > + can_parallel_vacuum[i] = true;
> >
> > I am not able to understand the second part of the comment ("because
> > the index size can get bigger during vacuum.").  What is its
> > relevance?
>
> I meant that the indexes can be begger even during vacuum. So we need
> to check the size of indexes and determine participations of parallel
> index vacuum at one place.
>

Okay, but that doesn't go with the earlier part of the comment.  We
can either remove it or explain it a bit more.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Robert Haas
Date:
On Thu, Dec 19, 2019 at 12:41 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
> Attached the updated version patch. This version patch incorporates
> the above comments and the comments from Mahendra. I also fixed one
> bug around determining the indexes that are vacuumed in parallel based
> on their option and size. Please review it.

I'm not enthusiastic about the fact that 0003 calls the fast option
'disable_delay' in some places. I think it would be more clear to call
it 'fast' everywhere.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Thu, 19 Dec 2019 at 22:48, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Dec 19, 2019 at 12:41 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> > Attached the updated version patch. This version patch incorporates
> > the above comments and the comments from Mahendra. I also fixed one
> > bug around determining the indexes that are vacuumed in parallel based
> > on their option and size. Please review it.
>
> I'm not enthusiastic about the fact that 0003 calls the fast option
> 'disable_delay' in some places. I think it would be more clear to call
> it 'fast' everywhere.
>

Agreed.

I've attached the updated version patch that incorporated the all
review comments I go so far.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Prabhat Sahu
Date:
Hi,

While testing this feature with parallel vacuum on "TEMPORARY TABLE", I got a server crash on PG Head+V36_patch.
Changed configuration parameters and Stack trace are as below:

autovacuum = on  
max_worker_processes = 4
shared_buffers = 10MB
max_parallel_workers = 8
max_parallel_maintenance_workers = 8
vacuum_cost_limit = 2000
vacuum_cost_delay = 10
min_parallel_table_scan_size = 8MB
min_parallel_index_scan_size = 0

-- Stack trace:
[centos@parallel-vacuum-testing bin]$ gdb -q -c data/core.1399 postgres
Reading symbols from /home/centos/BLP_Vacuum/postgresql/inst/bin/postgres...done.
[New LWP 1399]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: autovacuum worker   postgres                      '.
Program terminated with signal 6, Aborted.
#0  0x00007f4517d80337 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_7.2.x86_64 libcom_err-1.42.9-16.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0  0x00007f4517d80337 in raise () from /lib64/libc.so.6
#1  0x00007f4517d81a28 in abort () from /lib64/libc.so.6
#2  0x0000000000a96341 in ExceptionalCondition (conditionName=0xd18efb "strvalue != NULL", errorType=0xd18eeb "FailedAssertion",
    fileName=0xd18ee0 "snprintf.c", lineNumber=442) at assert.c:67
#3  0x0000000000b02522 in dopr (target=0x7ffdb0e38450, format=0xc5fa95 ".%s\"", args=0x7ffdb0e38538) at snprintf.c:442
#4  0x0000000000b01ea6 in pg_vsnprintf (str=0x256df50 "autovacuum: dropping orphan temp table \"postgres.", '\177' <repeats 151 times>..., count=1024,
    fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", args=0x7ffdb0e38538) at snprintf.c:195
#5  0x0000000000afbadf in pvsnprintf (buf=0x256df50 "autovacuum: dropping orphan temp table \"postgres.", '\177' <repeats 151 times>..., len=1024,
    fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", args=0x7ffdb0e38538) at psprintf.c:110
#6  0x0000000000afd34b in appendStringInfoVA (str=0x7ffdb0e38550, fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", args=0x7ffdb0e38538)
    at stringinfo.c:149
#7  0x0000000000a970fd in errmsg (fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"") at elog.c:832
#8  0x00000000008588d2 in do_autovacuum () at autovacuum.c:2249
#9  0x0000000000857b29 in AutoVacWorkerMain (argc=0, argv=0x0) at autovacuum.c:1689
#10 0x000000000085772f in StartAutoVacWorker () at autovacuum.c:1483
#11 0x000000000086e64f in StartAutovacuumWorker () at postmaster.c:5562
#12 0x000000000086e106 in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5279
#13 <signal handler called>
#14 0x00007f4517e3f933 in __select_nocancel () from /lib64/libc.so.6
#15 0x0000000000869838 in ServerLoop () at postmaster.c:1691
#16 0x0000000000869212 in PostmasterMain (argc=3, argv=0x256bd70) at postmaster.c:1400
#17 0x000000000077855d in main (argc=3, argv=0x256bd70) at main.c:210
(gdb)

I have tried to reproduce the same with all previously executed queries but now I am not able to reproduce the same.


On Thu, Dec 19, 2019 at 11:26 AM Mahendra Singh <mahi6run@gmail.com> wrote:
On Wed, 18 Dec 2019 at 12:07, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> [please trim extra text before responding]
>
> On Wed, Dec 18, 2019 at 12:01 PM Mahendra Singh <mahi6run@gmail.com> wrote:
> >
> > On Tue, 10 Dec 2019 at 00:30, Mahendra Singh <mahi6run@gmail.com> wrote:
> > >
> > >
> > > 3.
> > > After v35 patch, vacuum.sql regression test is taking too much time due to large number of inserts so by reducing number of tuples, we can reduce that time.
> > > +INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM generate_series(1,100000) i;
> > >
> > > here, instead of 100000, we can make 1000 to reduce time of this test case because we only want to test code and functionality.
> >
> > As we added check of min_parallel_index_scan_size in v36 patch set to
> > decide parallel vacuum, 1000 tuples are not enough to do parallel
> > vacuum. I can see that we are not launching any workers in vacuum.sql
> > test case and hence, code coverage also decreased. I am not sure that
> > how to fix this.
> >
>
> Try by setting min_parallel_index_scan_size to 0 in test case.

Thanks Amit for the fix.

Yes, we can add "set min_parallel_index_scan_size = 0;" in vacuum.sql
test case. I tested by setting min_parallel_index_scan_size=0 and it
is working fine.

@Masahiko san, please add above line in vacuum.sql test case.

Thanks and Regards
Mahendra Thalor
EnterpriseDB: http://www.enterprisedb.com




--

With Regards,

Prabhat Kumar Sahu
Skype ID: prabhat.sahu1984
EnterpriseDB Software India Pvt. Ltd.

The Postgres Database Company

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Fri, Dec 20, 2019 at 5:17 PM Prabhat Sahu <prabhat.sahu@enterprisedb.com> wrote:
Hi,

While testing this feature with parallel vacuum on "TEMPORARY TABLE", I got a server crash on PG Head+V36_patch.

From the call stack, it is not clear whether it is related to a patch at all.  Have you checked your test with and without the patch?  The reason is that the patch doesn't perform a parallel vacuum on temporary tables.
 
Changed configuration parameters and Stack trace are as below:

-- Stack trace:
[centos@parallel-vacuum-testing bin]$ gdb -q -c data/core.1399 postgres
Reading symbols from /home/centos/BLP_Vacuum/postgresql/inst/bin/postgres...done.
[New LWP 1399]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: autovacuum worker   postgres                      '.
Program terminated with signal 6, Aborted.
#0  0x00007f4517d80337 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_7.2.x86_64 libcom_err-1.42.9-16.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0  0x00007f4517d80337 in raise () from /lib64/libc.so.6
#1  0x00007f4517d81a28 in abort () from /lib64/libc.so.6
#2  0x0000000000a96341 in ExceptionalCondition (conditionName=0xd18efb "strvalue != NULL", errorType=0xd18eeb "FailedAssertion",
    fileName=0xd18ee0 "snprintf.c", lineNumber=442) at assert.c:67
#3  0x0000000000b02522 in dopr (target=0x7ffdb0e38450, format=0xc5fa95 ".%s\"", args=0x7ffdb0e38538) at snprintf.c:442
#4  0x0000000000b01ea6 in pg_vsnprintf (str=0x256df50 "autovacuum: dropping orphan temp table \"postgres.", '\177' <repeats 151 times>..., count=1024,
    fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", args=0x7ffdb0e38538) at snprintf.c:195
#5  0x0000000000afbadf in pvsnprintf (buf=0x256df50 "autovacuum: dropping orphan temp table \"postgres.", '\177' <repeats 151 times>..., len=1024,
    fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", args=0x7ffdb0e38538) at psprintf.c:110
#6  0x0000000000afd34b in appendStringInfoVA (str=0x7ffdb0e38550, fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", args=0x7ffdb0e38538)
    at stringinfo.c:149
#7  0x0000000000a970fd in errmsg (fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"") at elog.c:832
#8  0x00000000008588d2 in do_autovacuum () at autovacuum.c:2249

The call stack seems to indicate that the backend from where you were doing the operations on temporary tables seems to have crashed somehow and then autovacuum tries to clean up that orphaned temporary table.  And it crashes while printing the message for dropping orphan tables.  Below is that message:

ereport(LOG,
(errmsg("autovacuum: dropping orphan temp table \"%s.%s.%s\"",
get_database_name(MyDatabaseId),
get_namespace_name(classForm->relnamespace),
NameStr(classForm->relname))));

Now it can fail the assertion only if one of three parameters (database name, namespace, relname) is NULL which I can't see any way to happen unless you have manually removed one of namespace or database. 

(gdb)

I have tried to reproduce the same with all previously executed queries but now I am not able to reproduce the same.


I am not sure how from this we can conclude if there is any problem with this patch or otherwise unless you have some steps to show us what you have done.  It could happen if you somehow corrupt the database by manually removing stuff or maybe there is some genuine bug, but it is not at all clear.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh
Date:
On Fri, 20 Dec 2019 at 17:17, Prabhat Sahu
<prabhat.sahu@enterprisedb.com> wrote:
>
> Hi,
>
> While testing this feature with parallel vacuum on "TEMPORARY TABLE", I got a server crash on PG Head+V36_patch.
> Changed configuration parameters and Stack trace are as below:
>
> autovacuum = on
> max_worker_processes = 4
> shared_buffers = 10MB
> max_parallel_workers = 8
> max_parallel_maintenance_workers = 8
> vacuum_cost_limit = 2000
> vacuum_cost_delay = 10
> min_parallel_table_scan_size = 8MB
> min_parallel_index_scan_size = 0
>
> -- Stack trace:
> [centos@parallel-vacuum-testing bin]$ gdb -q -c data/core.1399 postgres
> Reading symbols from /home/centos/BLP_Vacuum/postgresql/inst/bin/postgres...done.
> [New LWP 1399]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `postgres: autovacuum worker   postgres                      '.
> Program terminated with signal 6, Aborted.
> #0  0x00007f4517d80337 in raise () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64
krb5-libs-1.15.1-37.el7_7.2.x86_64libcom_err-1.42.9-16.el7.x86_64 libgcc-4.8.5-39.el7.x86_64
libselinux-2.5-14.1.el7.x86_64openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64 
> (gdb) bt
> #0  0x00007f4517d80337 in raise () from /lib64/libc.so.6
> #1  0x00007f4517d81a28 in abort () from /lib64/libc.so.6
> #2  0x0000000000a96341 in ExceptionalCondition (conditionName=0xd18efb "strvalue != NULL", errorType=0xd18eeb
"FailedAssertion",
>     fileName=0xd18ee0 "snprintf.c", lineNumber=442) at assert.c:67
> #3  0x0000000000b02522 in dopr (target=0x7ffdb0e38450, format=0xc5fa95 ".%s\"", args=0x7ffdb0e38538) at
snprintf.c:442
> #4  0x0000000000b01ea6 in pg_vsnprintf (str=0x256df50 "autovacuum: dropping orphan temp table \"postgres.", '\177'
<repeats151 times>..., count=1024, 
>     fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", args=0x7ffdb0e38538) at snprintf.c:195
> #5  0x0000000000afbadf in pvsnprintf (buf=0x256df50 "autovacuum: dropping orphan temp table \"postgres.", '\177'
<repeats151 times>..., len=1024, 
>     fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", args=0x7ffdb0e38538) at psprintf.c:110
> #6  0x0000000000afd34b in appendStringInfoVA (str=0x7ffdb0e38550, fmt=0xc5fa68 "autovacuum: dropping orphan temp
table\"%s.%s.%s\"", args=0x7ffdb0e38538) 
>     at stringinfo.c:149
> #7  0x0000000000a970fd in errmsg (fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"") at elog.c:832
> #8  0x00000000008588d2 in do_autovacuum () at autovacuum.c:2249
> #9  0x0000000000857b29 in AutoVacWorkerMain (argc=0, argv=0x0) at autovacuum.c:1689
> #10 0x000000000085772f in StartAutoVacWorker () at autovacuum.c:1483
> #11 0x000000000086e64f in StartAutovacuumWorker () at postmaster.c:5562
> #12 0x000000000086e106 in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5279
> #13 <signal handler called>
> #14 0x00007f4517e3f933 in __select_nocancel () from /lib64/libc.so.6
> #15 0x0000000000869838 in ServerLoop () at postmaster.c:1691
> #16 0x0000000000869212 in PostmasterMain (argc=3, argv=0x256bd70) at postmaster.c:1400
> #17 0x000000000077855d in main (argc=3, argv=0x256bd70) at main.c:210
> (gdb)
>
> I have tried to reproduce the same with all previously executed queries but now I am not able to reproduce the same.

Thanks Prabhat for reporting this issue.

I am able to reproduce this issue at my end. I tested and verified
that this issue is not related to parallel vacuum patch. I am able to
reproduce this issue on HEAD without parallel vacuum patch(v37).

I will report this issue in new thread with reproducible test case.

Thanks and Regards
Mahendra Thalor
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Mon, 23 Dec 2019 at 16:24, Mahendra Singh <mahi6run@gmail.com> wrote:
>
> On Fri, 20 Dec 2019 at 17:17, Prabhat Sahu
> <prabhat.sahu@enterprisedb.com> wrote:
> >
> > Hi,
> >
> > While testing this feature with parallel vacuum on "TEMPORARY TABLE", I got a server crash on PG Head+V36_patch.
> > Changed configuration parameters and Stack trace are as below:
> >
> > autovacuum = on
> > max_worker_processes = 4
> > shared_buffers = 10MB
> > max_parallel_workers = 8
> > max_parallel_maintenance_workers = 8
> > vacuum_cost_limit = 2000
> > vacuum_cost_delay = 10
> > min_parallel_table_scan_size = 8MB
> > min_parallel_index_scan_size = 0
> >
> > -- Stack trace:
> > [centos@parallel-vacuum-testing bin]$ gdb -q -c data/core.1399 postgres
> > Reading symbols from /home/centos/BLP_Vacuum/postgresql/inst/bin/postgres...done.
> > [New LWP 1399]
> > [Thread debugging using libthread_db enabled]
> > Using host libthread_db library "/lib64/libthread_db.so.1".
> > Core was generated by `postgres: autovacuum worker   postgres                      '.
> > Program terminated with signal 6, Aborted.
> > #0  0x00007f4517d80337 in raise () from /lib64/libc.so.6
> > Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64
krb5-libs-1.15.1-37.el7_7.2.x86_64libcom_err-1.42.9-16.el7.x86_64 libgcc-4.8.5-39.el7.x86_64
libselinux-2.5-14.1.el7.x86_64openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64 
> > (gdb) bt
> > #0  0x00007f4517d80337 in raise () from /lib64/libc.so.6
> > #1  0x00007f4517d81a28 in abort () from /lib64/libc.so.6
> > #2  0x0000000000a96341 in ExceptionalCondition (conditionName=0xd18efb "strvalue != NULL", errorType=0xd18eeb
"FailedAssertion",
> >     fileName=0xd18ee0 "snprintf.c", lineNumber=442) at assert.c:67
> > #3  0x0000000000b02522 in dopr (target=0x7ffdb0e38450, format=0xc5fa95 ".%s\"", args=0x7ffdb0e38538) at
snprintf.c:442
> > #4  0x0000000000b01ea6 in pg_vsnprintf (str=0x256df50 "autovacuum: dropping orphan temp table \"postgres.", '\177'
<repeats151 times>..., count=1024, 
> >     fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", args=0x7ffdb0e38538) at snprintf.c:195
> > #5  0x0000000000afbadf in pvsnprintf (buf=0x256df50 "autovacuum: dropping orphan temp table \"postgres.", '\177'
<repeats151 times>..., len=1024, 
> >     fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", args=0x7ffdb0e38538) at psprintf.c:110
> > #6  0x0000000000afd34b in appendStringInfoVA (str=0x7ffdb0e38550, fmt=0xc5fa68 "autovacuum: dropping orphan temp
table\"%s.%s.%s\"", args=0x7ffdb0e38538) 
> >     at stringinfo.c:149
> > #7  0x0000000000a970fd in errmsg (fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"") at elog.c:832
> > #8  0x00000000008588d2 in do_autovacuum () at autovacuum.c:2249
> > #9  0x0000000000857b29 in AutoVacWorkerMain (argc=0, argv=0x0) at autovacuum.c:1689
> > #10 0x000000000085772f in StartAutoVacWorker () at autovacuum.c:1483
> > #11 0x000000000086e64f in StartAutovacuumWorker () at postmaster.c:5562
> > #12 0x000000000086e106 in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5279
> > #13 <signal handler called>
> > #14 0x00007f4517e3f933 in __select_nocancel () from /lib64/libc.so.6
> > #15 0x0000000000869838 in ServerLoop () at postmaster.c:1691
> > #16 0x0000000000869212 in PostmasterMain (argc=3, argv=0x256bd70) at postmaster.c:1400
> > #17 0x000000000077855d in main (argc=3, argv=0x256bd70) at main.c:210
> > (gdb)
> >
> > I have tried to reproduce the same with all previously executed queries but now I am not able to reproduce the
same.
>
> Thanks Prabhat for reporting this issue.
>
> I am able to reproduce this issue at my end. I tested and verified
> that this issue is not related to parallel vacuum patch. I am able to
> reproduce this issue on HEAD without parallel vacuum patch(v37).
>
> I will report this issue in new thread with reproducible test case.

Thank you so much!

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Fri, Dec 20, 2019 at 12:13 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> I've attached the updated version patch that incorporated the all
> review comments I go so far.
>

I have further edited the first two patches posted by you.  The
changes include (a) changed tests to reset the guc, (b) removing some
stuff which is not required in this version, (c) moving some variables
around to make them in better order, (d) changed comments and few
other cosmetic things and (e) commit messages for first two patches.

I think the first two patches attached in this email are in good shape
and we can commit those unless you or someone has more comments on
them, the main parallel vacuum patch can still be improved by some
more test/polish/review.  I am planning to push the first two patches
next week after another pass.  The first two patches are explained in
brief as below:

1. v4-0001-Delete-empty-pages-in-each-pass-during-GIST-VACUUM:  It
allows us to delete empty pages in each pass during GIST VACUUM.
Earlier, we use to postpone deleting empty pages till the second stage
of vacuum to amortize the cost of scanning internal pages.  However,
that can sometimes (say vacuum is canceled or errored between first
and second stage) delay the pages to be recycled.  Another thing is
that to facilitate deleting empty pages in the second stage, we need
to share the information of internal and empty pages between different
stages of vacuum.  It will be quite tricky to share this information
via DSM which is required for the main parallel vacuum patch.  Also,
it will bring the logic to reclaim deleted pages closer to nbtree
where we delete empty pages in each pass.  Overall, the advantages of
deleting empty pages in each pass outweigh the advantages of
postponing the same.  This patch is discussed in detail in a separate
thread [1].

2. v39-0001-Introduce-IndexAM-fields-for-parallel-vacuum.patch:
Introduce new fields amusemaintenanceworkmem and
amparallelvacuumoptions in IndexAmRoutine for parallel vacuum.  The
amusemaintenanceworkmem tells whether a particular IndexAM uses
maintenance_work_mem or not.  This will help in controlling the memory
used by individual workers as otherwise, each worker can consume
memory equal to maintenance_work_mem.  This has been discussed in
detail in a separate thread as well [2]. The amparallelvacuumoptions
tell whether a particular IndexAM participates in a parallel vacuum
and if so in which phase (bulkdelete, vacuumcleanup) of vacuum.


[1] - https://www.postgresql.org/message-id/CAA4eK1LGr%2BMN0xHZpJ2dfS8QNQ1a_aROKowZB%2BMPNep8FVtwAA%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CAA4eK1LmcD5aPogzwim5Nn58Ki+74a6Edghx4Wd8hAskvHaq5A@mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh
Date:
g_indg_On Mon, 23 Dec 2019 at 16:11, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Dec 20, 2019 at 12:13 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > I've attached the updated version patch that incorporated the all
> > review comments I go so far.
> >
>
> I have further edited the first two patches posted by you.  The
> changes include (a) changed tests to reset the guc, (b) removing some
> stuff which is not required in this version, (c) moving some variables
> around to make them in better order, (d) changed comments and few
> other cosmetic things and (e) commit messages for first two patches.
>
> I think the first two patches attached in this email are in good shape
> and we can commit those unless you or someone has more comments on
> them, the main parallel vacuum patch can still be improved by some
> more test/polish/review.  I am planning to push the first two patches
> next week after another pass.  The first two patches are explained in
> brief as below:
>
> 1. v4-0001-Delete-empty-pages-in-each-pass-during-GIST-VACUUM:  It
> allows us to delete empty pages in each pass during GIST VACUUM.
> Earlier, we use to postpone deleting empty pages till the second stage
> of vacuum to amortize the cost of scanning internal pages.  However,
> that can sometimes (say vacuum is canceled or errored between first
> and second stage) delay the pages to be recycled.  Another thing is
> that to facilitate deleting empty pages in the second stage, we need
> to share the information of internal and empty pages between different
> stages of vacuum.  It will be quite tricky to share this information
> via DSM which is required for the main parallel vacuum patch.  Also,
> it will bring the logic to reclaim deleted pages closer to nbtree
> where we delete empty pages in each pass.  Overall, the advantages of
> deleting empty pages in each pass outweigh the advantages of
> postponing the same.  This patch is discussed in detail in a separate
> thread [1].
>
> 2. v39-0001-Introduce-IndexAM-fields-for-parallel-vacuum.patch:
> Introduce new fields amusemaintenanceworkmem and
> amparallelvacuumoptions in IndexAmRoutine for parallel vacuum.  The
> amusemaintenanceworkmem tells whether a particular IndexAM uses
> maintenance_work_mem or not.  This will help in controlling the memory
> used by individual workers as otherwise, each worker can consume
> memory equal to maintenance_work_mem.  This has been discussed in
> detail in a separate thread as well [2]. The amparallelvacuumoptions
> tell whether a particular IndexAM participates in a parallel vacuum
> and if so in which phase (bulkdelete, vacuumcleanup) of vacuum.
>
>
> [1] - https://www.postgresql.org/message-id/CAA4eK1LGr%2BMN0xHZpJ2dfS8QNQ1a_aROKowZB%2BMPNep8FVtwAA%40mail.gmail.com
> [2] - https://www.postgresql.org/message-id/CAA4eK1LmcD5aPogzwim5Nn58Ki+74a6Edghx4Wd8hAskvHaq5A@mail.gmail.com
>

Hi,
I reviewed v39 patch set. Below are the some minor review comments:

1.
+     * memory equal to maitenance_work_mem, the new maitenance_work_mem for

maitenance_work_mem should be replaced by maintenance_work_mem.

2.
+ * The number of workers can vary between and bulkdelete and cleanup

I think, grammatically above sentence is not correct. "and" is extra in above sentence.

3.
+ /*
+ * Open table.  The lock mode is the same as the leader process.  It's
+ * okay because The lockmode does not conflict among the parallel workers.
+ */

I think, "lock mode" and "lockmode", both should be same.(means extra space should be removed from "lock mode"). In "The", "T" should be small case letter.

4.
+ /* We don't support parallel vacuum for autovacuum for now */

I think, above sentence should be like "As of now, we don't support parallel vacuum for autovacuum"

5. I am not sure that I am right but I can see that we are not consistent while ending the single line comments.

I think, if single line comment is started with "upper case letter", then we should not put period(dot) at the end of comment, but if comment started with "lower case letter", then we should put period(dot) at the end of comment.

a)
+ /* parallel vacuum must be active */
I think. we should end above comment with dot or we should make "p" of parallel as upper case letter.

b)
+ /* At least count itself */
I think, above is correct.

If my understanding is correct, then please let me know so that I can make these changes on the top of v39 patch set.

6.
+    bool        amusemaintenanceworkmem;

I think, we haven't ran pgindent.

Thanks and Regards
Mahendra Thalor
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Mon, Dec 23, 2019 at 11:02 PM Mahendra Singh <mahi6run@gmail.com> wrote:
>
> 5. I am not sure that I am right but I can see that we are not consistent while ending the single line comments.
>
> I think, if single line comment is started with "upper case letter", then we should not put period(dot) at the end of
comment,but if comment started with "lower case letter", then we should put period(dot) at the end of comment.
 
>
> a)
> + /* parallel vacuum must be active */
> I think. we should end above comment with dot or we should make "p" of parallel as upper case letter.
>
> b)
> + /* At least count itself */
> I think, above is correct.
>

I have checked a few files in this context and I don't see any
consistency, so I would suggest keeping the things matching with the
nearby code.  Do you have any reason for the above conclusion?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Mon, 23 Dec 2019 at 19:41, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Dec 20, 2019 at 12:13 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > I've attached the updated version patch that incorporated the all
> > review comments I go so far.
> >
>
> I have further edited the first two patches posted by you.  The
> changes include (a) changed tests to reset the guc, (b) removing some
> stuff which is not required in this version, (c) moving some variables
> around to make them in better order, (d) changed comments and few
> other cosmetic things and (e) commit messages for first two patches.
>
> I think the first two patches attached in this email are in good shape
> and we can commit those unless you or someone has more comments on
> them, the main parallel vacuum patch can still be improved by some
> more test/polish/review.  I am planning to push the first two patches
> next week after another pass.  The first two patches are explained in
> brief as below:
>
> 1. v4-0001-Delete-empty-pages-in-each-pass-during-GIST-VACUUM:  It
> allows us to delete empty pages in each pass during GIST VACUUM.
> Earlier, we use to postpone deleting empty pages till the second stage
> of vacuum to amortize the cost of scanning internal pages.  However,
> that can sometimes (say vacuum is canceled or errored between first
> and second stage) delay the pages to be recycled.  Another thing is
> that to facilitate deleting empty pages in the second stage, we need
> to share the information of internal and empty pages between different
> stages of vacuum.  It will be quite tricky to share this information
> via DSM which is required for the main parallel vacuum patch.  Also,
> it will bring the logic to reclaim deleted pages closer to nbtree
> where we delete empty pages in each pass.  Overall, the advantages of
> deleting empty pages in each pass outweigh the advantages of
> postponing the same.  This patch is discussed in detail in a separate
> thread [1].
>
> 2. v39-0001-Introduce-IndexAM-fields-for-parallel-vacuum.patch:
> Introduce new fields amusemaintenanceworkmem and
> amparallelvacuumoptions in IndexAmRoutine for parallel vacuum.  The
> amusemaintenanceworkmem tells whether a particular IndexAM uses
> maintenance_work_mem or not.  This will help in controlling the memory
> used by individual workers as otherwise, each worker can consume
> memory equal to maintenance_work_mem.  This has been discussed in
> detail in a separate thread as well [2]. The amparallelvacuumoptions
> tell whether a particular IndexAM participates in a parallel vacuum
> and if so in which phase (bulkdelete, vacuumcleanup) of vacuum.
>
>

Thank you for updating the patches!

The first patches look good to me. I'm reviewing other patches and
will post comments if there is.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Tue, Dec 24, 2019 at 12:08 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
>
> The first patches look good to me. I'm reviewing other patches and
> will post comments if there is.
>

Okay, feel free to address few comments raised by Mahendra along with
whatever you find.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, 24 Dec 2019 at 15:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Dec 24, 2019 at 12:08 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> >
> > The first patches look good to me. I'm reviewing other patches and
> > will post comments if there is.
> >

Oops I meant first "two" patches look good to me.

>
> Okay, feel free to address few comments raised by Mahendra along with
> whatever you find.

Thanks!

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, 24 Dec 2019 at 15:46, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Tue, 24 Dec 2019 at 15:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Dec 24, 2019 at 12:08 PM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > >
> > > The first patches look good to me. I'm reviewing other patches and
> > > will post comments if there is.
> > >
>
> Oops I meant first "two" patches look good to me.
>
> >
> > Okay, feel free to address few comments raised by Mahendra along with
> > whatever you find.
>
> Thanks!
>

I've attached updated patch set as the previous version patch set
conflicts to the current HEAD. This patch set incorporated the review
comments, a few fix and the patch for
PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION. 0001 patch is the same
as previous version.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh
Date:
On Wed, 25 Dec 2019 at 17:47, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Tue, 24 Dec 2019 at 15:46, Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Tue, 24 Dec 2019 at 15:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Dec 24, 2019 at 12:08 PM Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > >
> > > > The first patches look good to me. I'm reviewing other patches and
> > > > will post comments if there is.
> > > >
> >
> > Oops I meant first "two" patches look good to me.
> >
> > >
> > > Okay, feel free to address few comments raised by Mahendra along with
> > > whatever you find.
> >
> > Thanks!
> >
>
> I've attached updated patch set as the previous version patch set
> conflicts to the current HEAD. This patch set incorporated the review
> comments, a few fix and the patch for
> PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION. 0001 patch is the same
> as previous version.

I verified my all review comments in v40 patch set. All are fixed.

v40-0002-Add-a-parallel-option-to-the-VACUUM-command.patch doesn't
apply on HEAD. Please send re-based patch.

Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Tomas Vondra
Date:
Hi,

On Wed, Dec 25, 2019 at 09:17:16PM +0900, Masahiko Sawada wrote:
>On Tue, 24 Dec 2019 at 15:46, Masahiko Sawada
><masahiko.sawada@2ndquadrant.com> wrote:
>>
>> On Tue, 24 Dec 2019 at 15:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >
>> > On Tue, Dec 24, 2019 at 12:08 PM Masahiko Sawada
>> > <masahiko.sawada@2ndquadrant.com> wrote:
>> > >
>> > >
>> > > The first patches look good to me. I'm reviewing other patches and
>> > > will post comments if there is.
>> > >
>>
>> Oops I meant first "two" patches look good to me.
>>
>> >
>> > Okay, feel free to address few comments raised by Mahendra along with
>> > whatever you find.
>>
>> Thanks!
>>
>
>I've attached updated patch set as the previous version patch set
>conflicts to the current HEAD. This patch set incorporated the review
>comments, a few fix and the patch for
>PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION. 0001 patch is the same
>as previous version.
>

I've been reviewing the updated patches over the past couple of days, so
let me share some initial review comments. I initially started to read
the thread, but then I realized it's futile - the thread is massive, and
the patch changed so much re-reading the whole thread is a waste of time.

It might be useful write a summary of the current design, but AFAICS the
original plan to parallelize the heap scan is abandoned and we now do
just the steps that vacuum indexes in parallel. Which is fine, but it
means the subject "block level parallel vacuum" is a bit misleading.

Anyway, most of the logic is implemented in part 0002, which actually
does all the parallel worker stuff. The remaining parts 0001, 0003 and
0004 are either preparing infrastructure or not directlyrelated to the
primary feature.


v40-0001-Introduce-IndexAM-fields-for-parallel-vacuum.patch
-----------------------------------------------------------

I wonder if 'amparallelvacuumoptions' is unnecessarily specific. Maybe
it should be called just 'amvacuumoptions' or something like that? The
'parallel' part is actually encoded in names of the options.

Also, why do we need a separate amusemaintenanceworkmem option? Why
don't we simply track it using a separate flag in 'amvacuumoptions'
(or whatever we end up calling it)?

Would it make sense to track m_w_m usage separately for the two index
cleanup phases? Or is that unnecessary / pointless?


v40-0002-Add-a-parallel-option-to-the-VACUUM-command.patch
----------------------------------------------------------

I haven't found any issues yet, but I've only started with the code
review. I'll continue with the review. It seems in a fairly good shape
though, I think, I only have two minor comments at the moment:

- The SizeOfLVDeadTuples macro seems rather weird. It does include space
   for one ItemPointerData, but we really need an array of them. But then
   all the places where the macro is used explicitly add space for the
   pointers, so the sizeof(ItemPointerData) seems unnecessary. So it
   should be either

#define SizeOfLVDeadTuples (offsetof(LVDeadTuples, itemptrs))

   or 

#define SizeOfLVDeadTuples(cnt) \
   (offsetof(LVDeadTuples, itemptrs) + (cnt) * sizeof(ItemPointerData))

   in which case the callers can be simplified.

- It's not quite clear to me why we need the new nworkers_to_launch
   field in ParallelContext.


v40-0003-Add-FAST-option-to-vacuum-command.patch
------------------------------------------------

I do have a bit of an issue with this part - I'm not quite convinved we
actually need a FAST option, and I actually suspect we'll come to regret
it sooner than later. AFAIK it pretty much does exactly the same thing
as setting vacuum_cost_delay to 0, and IMO it's confusing to provide
multiple ways to do the same thing - I do expect reports from confused
users on pgsql-bugs etc. Why is setting vacuum_cost_delay directly not a
sufficient solution?

The same thing applies to the PARALLEL flag, added in 0002, BTW. Why do
we need a separate VACUUM option, instead of just using the existing
max_parallel_maintenance_workers GUC? It's good enough for CREATE INDEX
so why not here?

Maybe it's explained somewhere deep in the thread, of course ...


v40-0004-Add-ability-to-disable-leader-participation-in-p.patch
---------------------------------------------------------------

IMHO this should be simply merged into 0002.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Fri, 27 Dec 2019 at 11:24, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> Hi,
>
> On Wed, Dec 25, 2019 at 09:17:16PM +0900, Masahiko Sawada wrote:
> >On Tue, 24 Dec 2019 at 15:46, Masahiko Sawada
> ><masahiko.sawada@2ndquadrant.com> wrote:
> >>
> >> On Tue, 24 Dec 2019 at 15:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >> >
> >> > On Tue, Dec 24, 2019 at 12:08 PM Masahiko Sawada
> >> > <masahiko.sawada@2ndquadrant.com> wrote:
> >> > >
> >> > >
> >> > > The first patches look good to me. I'm reviewing other patches and
> >> > > will post comments if there is.
> >> > >
> >>
> >> Oops I meant first "two" patches look good to me.
> >>
> >> >
> >> > Okay, feel free to address few comments raised by Mahendra along with
> >> > whatever you find.
> >>
> >> Thanks!
> >>
> >
> >I've attached updated patch set as the previous version patch set
> >conflicts to the current HEAD. This patch set incorporated the review
> >comments, a few fix and the patch for
> >PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION. 0001 patch is the same
> >as previous version.
> >
>
> I've been reviewing the updated patches over the past couple of days, so
> let me share some initial review comments. I initially started to read
> the thread, but then I realized it's futile - the thread is massive, and
> the patch changed so much re-reading the whole thread is a waste of time.

Thank you for reviewing this patch!

>
> It might be useful write a summary of the current design, but AFAICS the
> original plan to parallelize the heap scan is abandoned and we now do
> just the steps that vacuum indexes in parallel. Which is fine, but it
> means the subject "block level parallel vacuum" is a bit misleading.
>

Yeah I should have renamed it. I'll summarize the current design.

> Anyway, most of the logic is implemented in part 0002, which actually
> does all the parallel worker stuff. The remaining parts 0001, 0003 and
> 0004 are either preparing infrastructure or not directlyrelated to the
> primary feature.
>
>
> v40-0001-Introduce-IndexAM-fields-for-parallel-vacuum.patch
> -----------------------------------------------------------
>
> I wonder if 'amparallelvacuumoptions' is unnecessarily specific. Maybe
> it should be called just 'amvacuumoptions' or something like that? The
> 'parallel' part is actually encoded in names of the options.
>

amvacuumoptions seems good to me.

> Also, why do we need a separate amusemaintenanceworkmem option? Why
> don't we simply track it using a separate flag in 'amvacuumoptions'
> (or whatever we end up calling it)?
>

It also seems like a good idea.

> Would it make sense to track m_w_m usage separately for the two index
> cleanup phases? Or is that unnecessary / pointless?

We could do that but currently index AM uses this option is only gin
indexes. And gin indexes could use maintenance_work_mem both during
bulkdelete and cleanup. So it might be unnecessary at least as of now.

>
>
> v40-0002-Add-a-parallel-option-to-the-VACUUM-command.patch
> ----------------------------------------------------------
>
> I haven't found any issues yet, but I've only started with the code
> review. I'll continue with the review. It seems in a fairly good shape
> though, I think, I only have two minor comments at the moment:
>
> - The SizeOfLVDeadTuples macro seems rather weird. It does include space
>    for one ItemPointerData, but we really need an array of them. But then
>    all the places where the macro is used explicitly add space for the
>    pointers, so the sizeof(ItemPointerData) seems unnecessary. So it
>    should be either
>
> #define SizeOfLVDeadTuples (offsetof(LVDeadTuples, itemptrs))
>
>    or
>
> #define SizeOfLVDeadTuples(cnt) \
>    (offsetof(LVDeadTuples, itemptrs) + (cnt) * sizeof(ItemPointerData))
>
>    in which case the callers can be simplified.

Fixed it to the former.

>
> - It's not quite clear to me why we need the new nworkers_to_launch
>    field in ParallelContext.

The motivation of nworkers_to_launch is to specify the number of
workers to actually launch when we use the same parallel context
several times while changing the number of workers to launch. Since
index AM can choose the participation of bulkdelete and/or cleanup,
the number of workers required for each vacuum phrases can be
different. I originally changed LaunchParallelWorkers to have the
number of workers to launch so that it launches different number of
workers for each vacuum phases but Robert suggested to change the
routine of reinitializing parallel context[1]. It would be less
confusing and would involve modify code in a lot fewer places. So with
this patch we specify the number of workers during initializing the
parallel context as a maximum number of workers. And using
ReinitializeParallelWorkers before doing either bulkdelete or cleanup
we specify the number of workers to launch.

>
>
> v40-0003-Add-FAST-option-to-vacuum-command.patch
> ------------------------------------------------
>
> I do have a bit of an issue with this part - I'm not quite convinved we
> actually need a FAST option, and I actually suspect we'll come to regret
> it sooner than later. AFAIK it pretty much does exactly the same thing
> as setting vacuum_cost_delay to 0, and IMO it's confusing to provide
> multiple ways to do the same thing - I do expect reports from confused
> users on pgsql-bugs etc. Why is setting vacuum_cost_delay directly not a
> sufficient solution?

I think the motivation of this option is similar to FREEZE. I think
it's sometimes a good idea to have a shortcut of popular usage and
make it have an name corresponding to its job. From that perspective I
think having FAST option would make sense but maybe we need more
discussion the combination parallel vacuum and vacuum delay.

>
> The same thing applies to the PARALLEL flag, added in 0002, BTW. Why do
> we need a separate VACUUM option, instead of just using the existing
> max_parallel_maintenance_workers GUC? It's good enough for CREATE INDEX
> so why not here?

AFAIR There was no such discussion so far but I think one reason could
be that parallel vacuum should be disabled by default. If the parallel
vacuum uses max_parallel_maintenance_workers (2 by default) rather
than having the option the parallel vacuum would work with default
setting but I think that it would become a big impact for user because
the disk access could become random reads and writes when some indexes
are on the same tablespace.

>
> Maybe it's explained somewhere deep in the thread, of course ...
>
>
> v40-0004-Add-ability-to-disable-leader-participation-in-p.patch
> ---------------------------------------------------------------
>
> IMHO this should be simply merged into 0002.

We discussed it's still unclear whether we really want to commit this
code and therefore it's separated from the main part. Please see more
details here[2].

I've fixed code based on the review comments and rebased to the
current HEAD. Some comments around vacuum option name and FAST option
are still left as we would need more discussion.

Regards,

[1] https://www.postgresql.org/message-id/CA%2BTgmobjtHdLfQhmzqBNt7VEsz%2B5w3P0yy0-EsoT05yAJViBSQ%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAA4eK1%2BC8OBhm4g3Mnfx%2BVjGfZ4ckLOLSU9i7Smo1sp4k0V5HA%40mail.gmail.com

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Tomas Vondra
Date:
On Sun, Dec 29, 2019 at 10:06:23PM +0900, Masahiko Sawada wrote:
>On Fri, 27 Dec 2019 at 11:24, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>
>> Hi,
>>
>> On Wed, Dec 25, 2019 at 09:17:16PM +0900, Masahiko Sawada wrote:
>> >On Tue, 24 Dec 2019 at 15:46, Masahiko Sawada
>> ><masahiko.sawada@2ndquadrant.com> wrote:
>> >>
>> >> On Tue, 24 Dec 2019 at 15:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >> >
>> >> > On Tue, Dec 24, 2019 at 12:08 PM Masahiko Sawada
>> >> > <masahiko.sawada@2ndquadrant.com> wrote:
>> >> > >
>> >> > >
>> >> > > The first patches look good to me. I'm reviewing other patches and
>> >> > > will post comments if there is.
>> >> > >
>> >>
>> >> Oops I meant first "two" patches look good to me.
>> >>
>> >> >
>> >> > Okay, feel free to address few comments raised by Mahendra along with
>> >> > whatever you find.
>> >>
>> >> Thanks!
>> >>
>> >
>> >I've attached updated patch set as the previous version patch set
>> >conflicts to the current HEAD. This patch set incorporated the review
>> >comments, a few fix and the patch for
>> >PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION. 0001 patch is the same
>> >as previous version.
>> >
>>
>> I've been reviewing the updated patches over the past couple of days, so
>> let me share some initial review comments. I initially started to read
>> the thread, but then I realized it's futile - the thread is massive, and
>> the patch changed so much re-reading the whole thread is a waste of time.
>
>Thank you for reviewing this patch!
>
>>
>> It might be useful write a summary of the current design, but AFAICS the
>> original plan to parallelize the heap scan is abandoned and we now do
>> just the steps that vacuum indexes in parallel. Which is fine, but it
>> means the subject "block level parallel vacuum" is a bit misleading.
>>
>
>Yeah I should have renamed it. I'll summarize the current design.
>

OK

>> Anyway, most of the logic is implemented in part 0002, which actually
>> does all the parallel worker stuff. The remaining parts 0001, 0003 and
>> 0004 are either preparing infrastructure or not directlyrelated to the
>> primary feature.
>>
>>
>> v40-0001-Introduce-IndexAM-fields-for-parallel-vacuum.patch
>> -----------------------------------------------------------
>>
>> I wonder if 'amparallelvacuumoptions' is unnecessarily specific. Maybe
>> it should be called just 'amvacuumoptions' or something like that? The
>> 'parallel' part is actually encoded in names of the options.
>>
>
>amvacuumoptions seems good to me.
>
>> Also, why do we need a separate amusemaintenanceworkmem option? Why
>> don't we simply track it using a separate flag in 'amvacuumoptions'
>> (or whatever we end up calling it)?
>>
>
>It also seems like a good idea.
>

I think there's another question we need to ask - why to we introduce a
bitmask, instead of using regular boolean struct members? Until now, the
IndexAmRoutine struct had simple boolean members describing capabilities
of the AM implementation. Why shouldn't this patch do the same thing,
i.e. add one boolean flag for each AM feature?

>> Would it make sense to track m_w_m usage separately for the two index
>> cleanup phases? Or is that unnecessary / pointless?
>
>We could do that but currently index AM uses this option is only gin
>indexes. And gin indexes could use maintenance_work_mem both during
>bulkdelete and cleanup. So it might be unnecessary at least as of now.
>

OK

>>
>>
>> v40-0002-Add-a-parallel-option-to-the-VACUUM-command.patch
>> ----------------------------------------------------------
>>
>> I haven't found any issues yet, but I've only started with the code
>> review. I'll continue with the review. It seems in a fairly good shape
>> though, I think, I only have two minor comments at the moment:
>>
>> - The SizeOfLVDeadTuples macro seems rather weird. It does include space
>>    for one ItemPointerData, but we really need an array of them. But then
>>    all the places where the macro is used explicitly add space for the
>>    pointers, so the sizeof(ItemPointerData) seems unnecessary. So it
>>    should be either
>>
>> #define SizeOfLVDeadTuples (offsetof(LVDeadTuples, itemptrs))
>>
>>    or
>>
>> #define SizeOfLVDeadTuples(cnt) \
>>    (offsetof(LVDeadTuples, itemptrs) + (cnt) * sizeof(ItemPointerData))
>>
>>    in which case the callers can be simplified.
>
>Fixed it to the former.
>

Hmmm, I'd actually suggest to use the latter variant, because it allows
simplifying the callers. Just translating it to offsetof() is not saving
much code, I think.

>>
>> - It's not quite clear to me why we need the new nworkers_to_launch
>>    field in ParallelContext.
>
>The motivation of nworkers_to_launch is to specify the number of
>workers to actually launch when we use the same parallel context
>several times while changing the number of workers to launch. Since
>index AM can choose the participation of bulkdelete and/or cleanup,
>the number of workers required for each vacuum phrases can be
>different. I originally changed LaunchParallelWorkers to have the
>number of workers to launch so that it launches different number of
>workers for each vacuum phases but Robert suggested to change the
>routine of reinitializing parallel context[1]. It would be less
>confusing and would involve modify code in a lot fewer places. So with
>this patch we specify the number of workers during initializing the
>parallel context as a maximum number of workers. And using
>ReinitializeParallelWorkers before doing either bulkdelete or cleanup
>we specify the number of workers to launch.
>

Hmmm. I find it a bit confusing, but I don't know a better solution.

>>
>>
>> v40-0003-Add-FAST-option-to-vacuum-command.patch
>> ------------------------------------------------
>>
>> I do have a bit of an issue with this part - I'm not quite convinved we
>> actually need a FAST option, and I actually suspect we'll come to regret
>> it sooner than later. AFAIK it pretty much does exactly the same thing
>> as setting vacuum_cost_delay to 0, and IMO it's confusing to provide
>> multiple ways to do the same thing - I do expect reports from confused
>> users on pgsql-bugs etc. Why is setting vacuum_cost_delay directly not a
>> sufficient solution?
>
>I think the motivation of this option is similar to FREEZE. I think
>it's sometimes a good idea to have a shortcut of popular usage and
>make it have an name corresponding to its job. From that perspective I
>think having FAST option would make sense but maybe we need more
>discussion the combination parallel vacuum and vacuum delay.
>

OK. I think it's mostly independent piece, so maybe we should move it to
a separate patch. It's more likely to get attention/feedback when not
buried in this thread.

>>
>> The same thing applies to the PARALLEL flag, added in 0002, BTW. Why do
>> we need a separate VACUUM option, instead of just using the existing
>> max_parallel_maintenance_workers GUC? It's good enough for CREATE INDEX
>> so why not here?
>
>AFAIR There was no such discussion so far but I think one reason could
>be that parallel vacuum should be disabled by default. If the parallel
>vacuum uses max_parallel_maintenance_workers (2 by default) rather
>than having the option the parallel vacuum would work with default
>setting but I think that it would become a big impact for user because
>the disk access could become random reads and writes when some indexes
>are on the same tablespace.
>

I'm not quite convinced VACUUM should have parallelism disabled by
default. I know some people argued we should do that because making
vacuum faster may put pressure on other parts of the system. Which is
true, but I don't think the solution is to make vacuum slower by
default. IMHO we should do the opposite - have it parallel by default
(as driven by max_parallel_maintenance_workers), and have an option
to disable parallelism.

It's pretty much the same thing we did with vacuum throttling - it's
disabled for explicit vacuum by default, but you can enable it. If
you're worried about VACUUM causing issues, you should cost delay.

The way it's done now we pretty much don't handle either case without
having to tweak something:

- If you really want to go as fast as possible (e.g. during maintenance
   window) you have to say "PARALLEL".

- If you need to restrict VACUUM activity, you have to et cost_delay
   because just not using parallelism seems unreliable.

Of course, the question is what to do about autovacuum - I agree it may
make sense to have parallelism disabled in this case (just like we
already have throttling enabled by default for autovacuum).

>>
>> Maybe it's explained somewhere deep in the thread, of course ...
>>
>>
>> v40-0004-Add-ability-to-disable-leader-participation-in-p.patch
>> ---------------------------------------------------------------
>>
>> IMHO this should be simply merged into 0002.
>
>We discussed it's still unclear whether we really want to commit this
>code and therefore it's separated from the main part. Please see more
>details here[2].
>

IMO there's not much reason for the leader not to participate. For
regular queries the leader may be doing useful stuff (essentially
running the non-parallel part of the query) but AFAIK for VAUCUM that's
not the case and the worker is not doing anything.

>I've fixed code based on the review comments and rebased to the
>current HEAD. Some comments around vacuum option name and FAST option
>are still left as we would need more discussion.
>

Thanks, I'll take a look.

regards

>--
>Masahiko Sawada            http://www.2ndQuadrant.com/
>PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services






-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Mon, Dec 30, 2019 at 2:53 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Sun, Dec 29, 2019 at 10:06:23PM +0900, Masahiko Sawada wrote:
> >> v40-0001-Introduce-IndexAM-fields-for-parallel-vacuum.patch
> >> -----------------------------------------------------------
> >>
> >> I wonder if 'amparallelvacuumoptions' is unnecessarily specific. Maybe
> >> it should be called just 'amvacuumoptions' or something like that? The
> >> 'parallel' part is actually encoded in names of the options.
> >>
> >
> >amvacuumoptions seems good to me.
> >
> >> Also, why do we need a separate amusemaintenanceworkmem option? Why
> >> don't we simply track it using a separate flag in 'amvacuumoptions'
> >> (or whatever we end up calling it)?
> >>
> >
> >It also seems like a good idea.
> >
>
> I think there's another question we need to ask - why to we introduce a
> bitmask, instead of using regular boolean struct members? Until now, the
> IndexAmRoutine struct had simple boolean members describing capabilities
> of the AM implementation. Why shouldn't this patch do the same thing,
> i.e. add one boolean flag for each AM feature?
>

This structure member describes mostly one property of index which is
about a parallel vacuum which I am not sure is true for other members.
Now, we can use separate bool variables for it which we were initially
using in the patch but that seems to be taking more space in a
structure without any advantage.  Also, using one variable makes a
code bit better because otherwise, in many places we need to check and
set four variables instead of one.  This is also the reason we used
parallel in its name (we also use *parallel* for parallel index scan
related things).  Having said that, we can remove parallel from its
name if we want to extend/use it for something other than a parallel
vacuum.  I think we might need to add a flag or two for parallelizing
heap scan of vacuum when we enhance this feature, so keeping it for
just a parallel vacuum is not completely insane.

I think keeping amusemaintenanceworkmem separate from this variable
seems to me like a better idea as it doesn't describe whether IndexAM
can participate in a parallel vacuum or not.  You can see more
discussion about that variable in the thread [1].

> >>
> >>
> >> v40-0004-Add-ability-to-disable-leader-participation-in-p.patch
> >> ---------------------------------------------------------------
> >>
> >> IMHO this should be simply merged into 0002.
> >
> >We discussed it's still unclear whether we really want to commit this
> >code and therefore it's separated from the main part. Please see more
> >details here[2].
> >
>
> IMO there's not much reason for the leader not to participate.
>

The only reason for this is just a debugging/testing aid because
during the development of other parallel features we required such a
knob.  The other way could be to have something similar to
force_parallel_mode and there is some discussion about that as well on
this thread but we haven't concluded which is better.  So, we decided
to keep it as a separate patch which we can use to test this feature
during development and decide later whether we really need to commit
it.  BTW, we have found few bugs by using this knob in the patch.

[1] - https://www.postgresql.org/message-id/CAA4eK1LmcD5aPogzwim5Nn58Ki+74a6Edghx4Wd8hAskvHaq5A@mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Mon, Dec 30, 2019 at 2:53 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Sun, Dec 29, 2019 at 10:06:23PM +0900, Masahiko Sawada wrote:
> >>
> >> v40-0003-Add-FAST-option-to-vacuum-command.patch
> >> ------------------------------------------------
> >>
> >> I do have a bit of an issue with this part - I'm not quite convinved we
> >> actually need a FAST option, and I actually suspect we'll come to regret
> >> it sooner than later. AFAIK it pretty much does exactly the same thing
> >> as setting vacuum_cost_delay to 0, and IMO it's confusing to provide
> >> multiple ways to do the same thing - I do expect reports from confused
> >> users on pgsql-bugs etc. Why is setting vacuum_cost_delay directly not a
> >> sufficient solution?
> >
> >I think the motivation of this option is similar to FREEZE. I think
> >it's sometimes a good idea to have a shortcut of popular usage and
> >make it have an name corresponding to its job. From that perspective I
> >think having FAST option would make sense but maybe we need more
> >discussion the combination parallel vacuum and vacuum delay.
> >
>
> OK. I think it's mostly independent piece, so maybe we should move it to
> a separate patch. It's more likely to get attention/feedback when not
> buried in this thread.
>

+1.  It is already a separate patch and I think we can even discuss
more on it in a new thread once the main patch is committed or do you
think we should have a conclusion about it now itself?  To me, this
option appears to be an extension to the main feature which can be
useful for some users and people might like to have a separate option,
so we can discuss it and get broader feedback after the main patch is
committed.

> >>
> >> The same thing applies to the PARALLEL flag, added in 0002, BTW. Why do
> >> we need a separate VACUUM option, instead of just using the existing
> >> max_parallel_maintenance_workers GUC?
> >>

How will user specify parallel degree?  The parallel degree is helpful
because in some cases users can decide how many workers should be
launched based on size and type of indexes.

> >> It's good enough for CREATE INDEX
> >> so why not here?
> >

That is a different feature and I think here users can make a better
judgment based on the size of indexes.  Moreover, users have an option
to control a parallel degree for 'Create Index' via Alter Table
<tbl_name> Set (parallel_workers = <n>) which I am not sure is a good
idea for parallel vacuum as the parallelism is more derived from size
and type of indexes.  Now, we can think of a similar parameter at the
table/index level for parallel vacuum, but I don't see it equally
useful in this case.

> >AFAIR There was no such discussion so far but I think one reason could
> >be that parallel vacuum should be disabled by default. If the parallel
> >vacuum uses max_parallel_maintenance_workers (2 by default) rather
> >than having the option the parallel vacuum would work with default
> >setting but I think that it would become a big impact for user because
> >the disk access could become random reads and writes when some indexes
> >are on the same tablespace.
> >
>
> I'm not quite convinced VACUUM should have parallelism disabled by
> default. I know some people argued we should do that because making
> vacuum faster may put pressure on other parts of the system. Which is
> true, but I don't think the solution is to make vacuum slower by
> default. IMHO we should do the opposite - have it parallel by default
> (as driven by max_parallel_maintenance_workers), and have an option
> to disable parallelism.
>

I think driving parallelism for vacuum by
max_parallel_maintenance_workers might not be sufficient.  We need to
give finer control as it depends a lot on the size of indexes. Also,
unlike Create Index, Vacuum can be performed on an entire database and
it is quite possible that some tables/indexes are relatively smaller
and forcing parallelism on them by default might slow down the
operation.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Tomas Vondra
Date:
On Mon, Dec 30, 2019 at 10:40:39AM +0530, Amit Kapila wrote:
>On Mon, Dec 30, 2019 at 2:53 AM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>>
>> On Sun, Dec 29, 2019 at 10:06:23PM +0900, Masahiko Sawada wrote:
>> >>
>> >> v40-0003-Add-FAST-option-to-vacuum-command.patch
>> >> ------------------------------------------------
>> >>
>> >> I do have a bit of an issue with this part - I'm not quite convinved we
>> >> actually need a FAST option, and I actually suspect we'll come to regret
>> >> it sooner than later. AFAIK it pretty much does exactly the same thing
>> >> as setting vacuum_cost_delay to 0, and IMO it's confusing to provide
>> >> multiple ways to do the same thing - I do expect reports from confused
>> >> users on pgsql-bugs etc. Why is setting vacuum_cost_delay directly not a
>> >> sufficient solution?
>> >
>> >I think the motivation of this option is similar to FREEZE. I think
>> >it's sometimes a good idea to have a shortcut of popular usage and
>> >make it have an name corresponding to its job. From that perspective I
>> >think having FAST option would make sense but maybe we need more
>> >discussion the combination parallel vacuum and vacuum delay.
>> >
>>
>> OK. I think it's mostly independent piece, so maybe we should move it to
>> a separate patch. It's more likely to get attention/feedback when not
>> buried in this thread.
>>
>
>+1.  It is already a separate patch and I think we can even discuss
>more on it in a new thread once the main patch is committed or do you
>think we should have a conclusion about it now itself?  To me, this
>option appears to be an extension to the main feature which can be
>useful for some users and people might like to have a separate option,
>so we can discuss it and get broader feedback after the main patch is
>committed.
>

I don't think it's an extension of the main feature - it does not depend
on it, it could be committed before or after the parallel vacuum (with
some conflicts, but the feature itself is not affected).

My point was that by moving it into a separate thread we're more likely
to get feedback on it, e.g. from people who don't feel like reviewing
the parallel vacuum feature and/or feel intimidated by t100+ messages in
this thread.

>> >>
>> >> The same thing applies to the PARALLEL flag, added in 0002, BTW. Why do
>> >> we need a separate VACUUM option, instead of just using the existing
>> >> max_parallel_maintenance_workers GUC?
>> >>
>
>How will user specify parallel degree?  The parallel degree is helpful
>because in some cases users can decide how many workers should be
>launched based on size and type of indexes.
>

By setting max_maintenance_parallel_workers.

>> >> It's good enough for CREATE INDEX
>> >> so why not here?
>> >
>
>That is a different feature and I think here users can make a better
>judgment based on the size of indexes.  Moreover, users have an option
>to control a parallel degree for 'Create Index' via Alter Table
><tbl_name> Set (parallel_workers = <n>) which I am not sure is a good
>idea for parallel vacuum as the parallelism is more derived from size
>and type of indexes.  Now, we can think of a similar parameter at the
>table/index level for parallel vacuum, but I don't see it equally
>useful in this case.
>

I'm a bit skeptical about users being able to pick good parallel degree.
If we (i.e. experienced developers/hackers with quite a bit of
knowledge) can't come up with a reasonable heuristics, how likely is it
that a regular user will come up with something better?

Not sure I understand why "parallel_workers" would not be suitable for
parallel vacuum? I mean, even for CREATE INDEX it certainly matters the
size/type of indexes, no?

I may be wrong in both cases, of course.

>> >AFAIR There was no such discussion so far but I think one reason could
>> >be that parallel vacuum should be disabled by default. If the parallel
>> >vacuum uses max_parallel_maintenance_workers (2 by default) rather
>> >than having the option the parallel vacuum would work with default
>> >setting but I think that it would become a big impact for user because
>> >the disk access could become random reads and writes when some indexes
>> >are on the same tablespace.
>> >
>>
>> I'm not quite convinced VACUUM should have parallelism disabled by
>> default. I know some people argued we should do that because making
>> vacuum faster may put pressure on other parts of the system. Which is
>> true, but I don't think the solution is to make vacuum slower by
>> default. IMHO we should do the opposite - have it parallel by default
>> (as driven by max_parallel_maintenance_workers), and have an option
>> to disable parallelism.
>>
>
>I think driving parallelism for vacuum by
>max_parallel_maintenance_workers might not be sufficient.  We need to
>give finer control as it depends a lot on the size of indexes. Also,
>unlike Create Index, Vacuum can be performed on an entire database and
>it is quite possible that some tables/indexes are relatively smaller
>and forcing parallelism on them by default might slow down the
>operation.
>

Why wouldn't it be sufficient? Why couldn't this use similar logic to
what we have in plan_create_index_workers for CREATE INDEX?

Sure, it may be useful to give power users a way to override the default
logic, but I very much doubt users can make reliable judgments about
parallelism.

Also, it's not like the risks are comparable in those two cases. If you
have very large table with a lot of indexes, the gains with parallel
vacuum are pretty much bound to be significant, possibly 10x or more.
OTOH if the table is small, parallelism may not give you much and it may
even be less efficient, but I doubt it's going to be 10x slower. And
considering min_parallel_index_scan_size already protects us against
this, at least partially.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Tomas Vondra
Date:
On Mon, Dec 30, 2019 at 08:25:28AM +0530, Amit Kapila wrote:
>On Mon, Dec 30, 2019 at 2:53 AM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>>
>> On Sun, Dec 29, 2019 at 10:06:23PM +0900, Masahiko Sawada wrote:
>> >> v40-0001-Introduce-IndexAM-fields-for-parallel-vacuum.patch
>> >> -----------------------------------------------------------
>> >>
>> >> I wonder if 'amparallelvacuumoptions' is unnecessarily specific. Maybe
>> >> it should be called just 'amvacuumoptions' or something like that? The
>> >> 'parallel' part is actually encoded in names of the options.
>> >>
>> >
>> >amvacuumoptions seems good to me.
>> >
>> >> Also, why do we need a separate amusemaintenanceworkmem option? Why
>> >> don't we simply track it using a separate flag in 'amvacuumoptions'
>> >> (or whatever we end up calling it)?
>> >>
>> >
>> >It also seems like a good idea.
>> >
>>
>> I think there's another question we need to ask - why to we introduce a
>> bitmask, instead of using regular boolean struct members? Until now, the
>> IndexAmRoutine struct had simple boolean members describing capabilities
>> of the AM implementation. Why shouldn't this patch do the same thing,
>> i.e. add one boolean flag for each AM feature?
>>
>
>This structure member describes mostly one property of index which is
>about a parallel vacuum which I am not sure is true for other members.
>Now, we can use separate bool variables for it which we were initially
>using in the patch but that seems to be taking more space in a
>structure without any advantage.  Also, using one variable makes a
>code bit better because otherwise, in many places we need to check and
>set four variables instead of one.  This is also the reason we used
>parallel in its name (we also use *parallel* for parallel index scan
>related things).  Having said that, we can remove parallel from its
>name if we want to extend/use it for something other than a parallel
>vacuum.  I think we might need to add a flag or two for parallelizing
>heap scan of vacuum when we enhance this feature, so keeping it for
>just a parallel vacuum is not completely insane.
>
>I think keeping amusemaintenanceworkmem separate from this variable
>seems to me like a better idea as it doesn't describe whether IndexAM
>can participate in a parallel vacuum or not.  You can see more
>discussion about that variable in the thread [1].
>

I don't know, but IMHO it's somewhat easier to work with separate flags.
Bitmasks make sense when space usage matters a lot, e.g. for on-disk
representation, but that doesn't seem to be the case here I think (if it
was, we'd probably use bitmasks already).

It seems like we're mixing two ways to design the struct unnecessarily,
but I'm not going to nag about this any further.

>> >>
>> >>
>> >> v40-0004-Add-ability-to-disable-leader-participation-in-p.patch
>> >> ---------------------------------------------------------------
>> >>
>> >> IMHO this should be simply merged into 0002.
>> >
>> >We discussed it's still unclear whether we really want to commit this
>> >code and therefore it's separated from the main part. Please see more
>> >details here[2].
>> >
>>
>> IMO there's not much reason for the leader not to participate.
>>
>
>The only reason for this is just a debugging/testing aid because
>during the development of other parallel features we required such a
>knob.  The other way could be to have something similar to
>force_parallel_mode and there is some discussion about that as well on
>this thread but we haven't concluded which is better.  So, we decided
>to keep it as a separate patch which we can use to test this feature
>during development and decide later whether we really need to commit
>it.  BTW, we have found few bugs by using this knob in the patch.
>

OK, understood. Then why not just use force_parallel_mode?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Mon, Dec 30, 2019 at 6:46 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Mon, Dec 30, 2019 at 08:25:28AM +0530, Amit Kapila wrote:
> >On Mon, Dec 30, 2019 at 2:53 AM Tomas Vondra
> ><tomas.vondra@2ndquadrant.com> wrote:
> >> I think there's another question we need to ask - why to we introduce a
> >> bitmask, instead of using regular boolean struct members? Until now, the
> >> IndexAmRoutine struct had simple boolean members describing capabilities
> >> of the AM implementation. Why shouldn't this patch do the same thing,
> >> i.e. add one boolean flag for each AM feature?
> >>
> >
> >This structure member describes mostly one property of index which is
> >about a parallel vacuum which I am not sure is true for other members.
> >Now, we can use separate bool variables for it which we were initially
> >using in the patch but that seems to be taking more space in a
> >structure without any advantage.  Also, using one variable makes a
> >code bit better because otherwise, in many places we need to check and
> >set four variables instead of one.  This is also the reason we used
> >parallel in its name (we also use *parallel* for parallel index scan
> >related things).  Having said that, we can remove parallel from its
> >name if we want to extend/use it for something other than a parallel
> >vacuum.  I think we might need to add a flag or two for parallelizing
> >heap scan of vacuum when we enhance this feature, so keeping it for
> >just a parallel vacuum is not completely insane.
> >
> >I think keeping amusemaintenanceworkmem separate from this variable
> >seems to me like a better idea as it doesn't describe whether IndexAM
> >can participate in a parallel vacuum or not.  You can see more
> >discussion about that variable in the thread [1].
> >
>
> I don't know, but IMHO it's somewhat easier to work with separate flags.
> Bitmasks make sense when space usage matters a lot, e.g. for on-disk
> representation, but that doesn't seem to be the case here I think (if it
> was, we'd probably use bitmasks already).
>
> It seems like we're mixing two ways to design the struct unnecessarily,
> but I'm not going to nag about this any further.
>

Fair enough.  I see your point and as mentioned earlier that we
started with the approach of separate booleans, but later found that
this is a better way as it was easier to set and check the different
parallel options for a parallel vacuum.   I think we can go back to
the individual booleans if we want but I am not sure if that is a
better approach for this usage.  Sawada-San, others, do you have any
opinion here?

> >> >>
> >> >>
> >> >> v40-0004-Add-ability-to-disable-leader-participation-in-p.patch
> >> >> ---------------------------------------------------------------
> >> >>
> >> >> IMHO this should be simply merged into 0002.
> >> >
> >> >We discussed it's still unclear whether we really want to commit this
> >> >code and therefore it's separated from the main part. Please see more
> >> >details here[2].
> >> >
> >>
> >> IMO there's not much reason for the leader not to participate.
> >>
> >
> >The only reason for this is just a debugging/testing aid because
> >during the development of other parallel features we required such a
> >knob.  The other way could be to have something similar to
> >force_parallel_mode and there is some discussion about that as well on
> >this thread but we haven't concluded which is better.  So, we decided
> >to keep it as a separate patch which we can use to test this feature
> >during development and decide later whether we really need to commit
> >it.  BTW, we have found few bugs by using this knob in the patch.
> >
>
> OK, understood. Then why not just use force_parallel_mode?
>

Because we are not sure what should be its behavior under different
modes especially what should we do when user set force_parallel_mode =
on.  We can even consider introducing new guc specific for this, but
as of now, I am not convinced that is required.  See some more
discussion around this parameter in emails [1][2].  I think we can
decide on this later (probably once the main patch is committed) as we
already have one way to test the patch.

[1] - https://www.postgresql.org/message-id/CAFiTN-sUuLASVXm2qOjufVH3tBZHPLdujMJ0RHr47Tnctjk9YA%40mail.gmail.com
[2] -
https://www.postgresql.org/message-id/CA%2Bfd4k6VgA_DG%3D8%3Dui7UvHhqx9VbQ-%2B72X%3D_GdTzh%3DJ_xN%2BVEg%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Mon, Dec 30, 2019 at 6:37 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Mon, Dec 30, 2019 at 10:40:39AM +0530, Amit Kapila wrote:
> >On Mon, Dec 30, 2019 at 2:53 AM Tomas Vondra
> ><tomas.vondra@2ndquadrant.com> wrote:
> >>
> >
> >+1.  It is already a separate patch and I think we can even discuss
> >more on it in a new thread once the main patch is committed or do you
> >think we should have a conclusion about it now itself?  To me, this
> >option appears to be an extension to the main feature which can be
> >useful for some users and people might like to have a separate option,
> >so we can discuss it and get broader feedback after the main patch is
> >committed.
> >
>
> I don't think it's an extension of the main feature - it does not depend
> on it, it could be committed before or after the parallel vacuum (with
> some conflicts, but the feature itself is not affected).
>
> My point was that by moving it into a separate thread we're more likely
> to get feedback on it, e.g. from people who don't feel like reviewing
> the parallel vacuum feature and/or feel intimidated by t100+ messages in
> this thread.
>

I agree with this point.

> >> >>
> >> >> The same thing applies to the PARALLEL flag, added in 0002, BTW. Why do
> >> >> we need a separate VACUUM option, instead of just using the existing
> >> >> max_parallel_maintenance_workers GUC?
> >> >>
> >
> >How will user specify parallel degree?  The parallel degree is helpful
> >because in some cases users can decide how many workers should be
> >launched based on size and type of indexes.
> >
>
> By setting max_maintenance_parallel_workers.
>
> >> >> It's good enough for CREATE INDEX
> >> >> so why not here?
> >> >
> >
> >That is a different feature and I think here users can make a better
> >judgment based on the size of indexes.  Moreover, users have an option
> >to control a parallel degree for 'Create Index' via Alter Table
> ><tbl_name> Set (parallel_workers = <n>) which I am not sure is a good
> >idea for parallel vacuum as the parallelism is more derived from size
> >and type of indexes.  Now, we can think of a similar parameter at the
> >table/index level for parallel vacuum, but I don't see it equally
> >useful in this case.
> >
>
> I'm a bit skeptical about users being able to pick good parallel degree.
> If we (i.e. experienced developers/hackers with quite a bit of
> knowledge) can't come up with a reasonable heuristics, how likely is it
> that a regular user will come up with something better?
>

In this case, it is highly dependent on the number of indexes (as for
each index, we can spawn one worker).   So, it is a bit easier for the
users to specify it.  Now, we can internally also identify the same
and we do that in case the user doesn't specify it, however, that can
easily lead to more resource (CPU, I/O) usage than the user would like
to do for a particular vacuum.  So, giving an option to the user
sounds quite reasonable to me.  Anyway, in case user doesn't specify
the parallel_degree, we are going to select one internally.

> Not sure I understand why "parallel_workers" would not be suitable for
> parallel vacuum? I mean, even for CREATE INDEX it certainly matters the
> size/type of indexes, no?
>

The difference here is that in parallel vacuum each worker can scan a
separate index whereas parallel_workers is more of an option for
scanning heap in parallel.  So, if the size of the heap is bigger,
then increasing that value helps whereas here if there are more number
of indexes on the table, increasing corresponding value for parallel
vacuum can help.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, 31 Dec 2019 at 12:39, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Dec 30, 2019 at 6:46 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
> >
> > On Mon, Dec 30, 2019 at 08:25:28AM +0530, Amit Kapila wrote:
> > >On Mon, Dec 30, 2019 at 2:53 AM Tomas Vondra
> > ><tomas.vondra@2ndquadrant.com> wrote:
> > >> I think there's another question we need to ask - why to we introduce a
> > >> bitmask, instead of using regular boolean struct members? Until now, the
> > >> IndexAmRoutine struct had simple boolean members describing capabilities
> > >> of the AM implementation. Why shouldn't this patch do the same thing,
> > >> i.e. add one boolean flag for each AM feature?
> > >>
> > >
> > >This structure member describes mostly one property of index which is
> > >about a parallel vacuum which I am not sure is true for other members.
> > >Now, we can use separate bool variables for it which we were initially
> > >using in the patch but that seems to be taking more space in a
> > >structure without any advantage.  Also, using one variable makes a
> > >code bit better because otherwise, in many places we need to check and
> > >set four variables instead of one.  This is also the reason we used
> > >parallel in its name (we also use *parallel* for parallel index scan
> > >related things).  Having said that, we can remove parallel from its
> > >name if we want to extend/use it for something other than a parallel
> > >vacuum.  I think we might need to add a flag or two for parallelizing
> > >heap scan of vacuum when we enhance this feature, so keeping it for
> > >just a parallel vacuum is not completely insane.
> > >
> > >I think keeping amusemaintenanceworkmem separate from this variable
> > >seems to me like a better idea as it doesn't describe whether IndexAM
> > >can participate in a parallel vacuum or not.  You can see more
> > >discussion about that variable in the thread [1].
> > >
> >
> > I don't know, but IMHO it's somewhat easier to work with separate flags.
> > Bitmasks make sense when space usage matters a lot, e.g. for on-disk
> > representation, but that doesn't seem to be the case here I think (if it
> > was, we'd probably use bitmasks already).
> >
> > It seems like we're mixing two ways to design the struct unnecessarily,
> > but I'm not going to nag about this any further.
> >
>
> Fair enough.  I see your point and as mentioned earlier that we
> started with the approach of separate booleans, but later found that
> this is a better way as it was easier to set and check the different
> parallel options for a parallel vacuum.   I think we can go back to
> the individual booleans if we want but I am not sure if that is a
> better approach for this usage.  Sawada-San, others, do you have any
> opinion here?

If we go back to the individual booleans we would end up with having
three booleans: bulkdelete, cleanup and conditional cleanup. I think
making the bulkdelete option to a boolean makes sense but having two
booleans for cleanup and conditional cleanup might be slightly odd
because these options are exclusive. If we don't stick to have only
booleans the having a ternary value for cleanup might be
understandable but I'm not sure it's better to have it for only vacuum
purpose.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Thu, Jan 2, 2020 at 8:29 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Tue, 31 Dec 2019 at 12:39, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Dec 30, 2019 at 6:46 PM Tomas Vondra
> > <tomas.vondra@2ndquadrant.com> wrote:
> > >
> > > On Mon, Dec 30, 2019 at 08:25:28AM +0530, Amit Kapila wrote:
> > > >On Mon, Dec 30, 2019 at 2:53 AM Tomas Vondra
> > > ><tomas.vondra@2ndquadrant.com> wrote:
> > > >> I think there's another question we need to ask - why to we introduce a
> > > >> bitmask, instead of using regular boolean struct members? Until now, the
> > > >> IndexAmRoutine struct had simple boolean members describing capabilities
> > > >> of the AM implementation. Why shouldn't this patch do the same thing,
> > > >> i.e. add one boolean flag for each AM feature?
> > > >>
> > > >
> > > >This structure member describes mostly one property of index which is
> > > >about a parallel vacuum which I am not sure is true for other members.
> > > >Now, we can use separate bool variables for it which we were initially
> > > >using in the patch but that seems to be taking more space in a
> > > >structure without any advantage.  Also, using one variable makes a
> > > >code bit better because otherwise, in many places we need to check and
> > > >set four variables instead of one.  This is also the reason we used
> > > >parallel in its name (we also use *parallel* for parallel index scan
> > > >related things).  Having said that, we can remove parallel from its
> > > >name if we want to extend/use it for something other than a parallel
> > > >vacuum.  I think we might need to add a flag or two for parallelizing
> > > >heap scan of vacuum when we enhance this feature, so keeping it for
> > > >just a parallel vacuum is not completely insane.
> > > >
> > > >I think keeping amusemaintenanceworkmem separate from this variable
> > > >seems to me like a better idea as it doesn't describe whether IndexAM
> > > >can participate in a parallel vacuum or not.  You can see more
> > > >discussion about that variable in the thread [1].
> > > >
> > >
> > > I don't know, but IMHO it's somewhat easier to work with separate flags.
> > > Bitmasks make sense when space usage matters a lot, e.g. for on-disk
> > > representation, but that doesn't seem to be the case here I think (if it
> > > was, we'd probably use bitmasks already).
> > >
> > > It seems like we're mixing two ways to design the struct unnecessarily,
> > > but I'm not going to nag about this any further.
> > >
> >
> > Fair enough.  I see your point and as mentioned earlier that we
> > started with the approach of separate booleans, but later found that
> > this is a better way as it was easier to set and check the different
> > parallel options for a parallel vacuum.   I think we can go back to
> > the individual booleans if we want but I am not sure if that is a
> > better approach for this usage.  Sawada-San, others, do you have any
> > opinion here?
>
> If we go back to the individual booleans we would end up with having
> three booleans: bulkdelete, cleanup and conditional cleanup. I think
> making the bulkdelete option to a boolean makes sense but having two
> booleans for cleanup and conditional cleanup might be slightly odd
> because these options are exclusive.
>

If we have only three booleans, then we need to check for all three to
conclude that a parallel vacuum is not enabled for any index.
Alternatively, we can have a fourth boolean to indicate that a
parallel vacuum is not enabled.  And in the future, when we allow
supporting multiple workers for an index, we might need another
variable unless we can allow it for all types of indexes.  This was my
point that having multiple variables for the purpose of a parallel
vacuum (for indexes) doesn't sound like a better approach than having
a single uint8 variable.

> If we don't stick to have only
> booleans the having a ternary value for cleanup might be
> understandable but I'm not sure it's better to have it for only vacuum
> purpose.
>

If we want to keep the possibility of extending it for other purposes,
then we can probably rename it to amoptions or something like that.
What do you think?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Tue, Dec 31, 2019 at 9:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Dec 30, 2019 at 6:46 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
> >
> > On Mon, Dec 30, 2019 at 08:25:28AM +0530, Amit Kapila wrote:
> > >On Mon, Dec 30, 2019 at 2:53 AM Tomas Vondra
> > ><tomas.vondra@2ndquadrant.com> wrote:
> > >> I think there's another question we need to ask - why to we introduce a
> > >> bitmask, instead of using regular boolean struct members? Until now, the
> > >> IndexAmRoutine struct had simple boolean members describing capabilities
> > >> of the AM implementation. Why shouldn't this patch do the same thing,
> > >> i.e. add one boolean flag for each AM feature?
> > >>
> > >
> > >This structure member describes mostly one property of index which is
> > >about a parallel vacuum which I am not sure is true for other members.
> > >Now, we can use separate bool variables for it which we were initially
> > >using in the patch but that seems to be taking more space in a
> > >structure without any advantage.  Also, using one variable makes a
> > >code bit better because otherwise, in many places we need to check and
> > >set four variables instead of one.  This is also the reason we used
> > >parallel in its name (we also use *parallel* for parallel index scan
> > >related things).  Having said that, we can remove parallel from its
> > >name if we want to extend/use it for something other than a parallel
> > >vacuum.  I think we might need to add a flag or two for parallelizing
> > >heap scan of vacuum when we enhance this feature, so keeping it for
> > >just a parallel vacuum is not completely insane.
> > >
> > >I think keeping amusemaintenanceworkmem separate from this variable
> > >seems to me like a better idea as it doesn't describe whether IndexAM
> > >can participate in a parallel vacuum or not.  You can see more
> > >discussion about that variable in the thread [1].
> > >
> >
> > I don't know, but IMHO it's somewhat easier to work with separate flags.
> > Bitmasks make sense when space usage matters a lot, e.g. for on-disk
> > representation, but that doesn't seem to be the case here I think (if it
> > was, we'd probably use bitmasks already).
> >
> > It seems like we're mixing two ways to design the struct unnecessarily,
> > but I'm not going to nag about this any further.
> >
>
> Fair enough.  I see your point and as mentioned earlier that we
> started with the approach of separate booleans, but later found that
> this is a better way as it was easier to set and check the different
> parallel options for a parallel vacuum.   I think we can go back to
> the individual booleans if we want but I am not sure if that is a
> better approach for this usage.  Sawada-San, others, do you have any
> opinion here?
IMHO, having multiple bools will be confusing compared to what we have
now because these are all related to enabling parallelism for
different phases of the vacuum.  So it makes more sense to keep it as
a single variable with multiple options.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Thu, Jan 2, 2020 at 9:03 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Jan 2, 2020 at 8:29 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Tue, 31 Dec 2019 at 12:39, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Mon, Dec 30, 2019 at 6:46 PM Tomas Vondra
> > > <tomas.vondra@2ndquadrant.com> wrote:
> > > >
> > > > On Mon, Dec 30, 2019 at 08:25:28AM +0530, Amit Kapila wrote:
> > > > >On Mon, Dec 30, 2019 at 2:53 AM Tomas Vondra
> > > > ><tomas.vondra@2ndquadrant.com> wrote:
> > > > >> I think there's another question we need to ask - why to we introduce a
> > > > >> bitmask, instead of using regular boolean struct members? Until now, the
> > > > >> IndexAmRoutine struct had simple boolean members describing capabilities
> > > > >> of the AM implementation. Why shouldn't this patch do the same thing,
> > > > >> i.e. add one boolean flag for each AM feature?
> > > > >>
> > > > >
> > > > >This structure member describes mostly one property of index which is
> > > > >about a parallel vacuum which I am not sure is true for other members.
> > > > >Now, we can use separate bool variables for it which we were initially
> > > > >using in the patch but that seems to be taking more space in a
> > > > >structure without any advantage.  Also, using one variable makes a
> > > > >code bit better because otherwise, in many places we need to check and
> > > > >set four variables instead of one.  This is also the reason we used
> > > > >parallel in its name (we also use *parallel* for parallel index scan
> > > > >related things).  Having said that, we can remove parallel from its
> > > > >name if we want to extend/use it for something other than a parallel
> > > > >vacuum.  I think we might need to add a flag or two for parallelizing
> > > > >heap scan of vacuum when we enhance this feature, so keeping it for
> > > > >just a parallel vacuum is not completely insane.
> > > > >
> > > > >I think keeping amusemaintenanceworkmem separate from this variable
> > > > >seems to me like a better idea as it doesn't describe whether IndexAM
> > > > >can participate in a parallel vacuum or not.  You can see more
> > > > >discussion about that variable in the thread [1].
> > > > >
> > > >
> > > > I don't know, but IMHO it's somewhat easier to work with separate flags.
> > > > Bitmasks make sense when space usage matters a lot, e.g. for on-disk
> > > > representation, but that doesn't seem to be the case here I think (if it
> > > > was, we'd probably use bitmasks already).
> > > >
> > > > It seems like we're mixing two ways to design the struct unnecessarily,
> > > > but I'm not going to nag about this any further.
> > > >
> > >
> > > Fair enough.  I see your point and as mentioned earlier that we
> > > started with the approach of separate booleans, but later found that
> > > this is a better way as it was easier to set and check the different
> > > parallel options for a parallel vacuum.   I think we can go back to
> > > the individual booleans if we want but I am not sure if that is a
> > > better approach for this usage.  Sawada-San, others, do you have any
> > > opinion here?
> >
> > If we go back to the individual booleans we would end up with having
> > three booleans: bulkdelete, cleanup and conditional cleanup. I think
> > making the bulkdelete option to a boolean makes sense but having two
> > booleans for cleanup and conditional cleanup might be slightly odd
> > because these options are exclusive.
> >
>
> If we have only three booleans, then we need to check for all three to
> conclude that a parallel vacuum is not enabled for any index.
> Alternatively, we can have a fourth boolean to indicate that a
> parallel vacuum is not enabled.  And in the future, when we allow
> supporting multiple workers for an index, we might need another
> variable unless we can allow it for all types of indexes.  This was my
> point that having multiple variables for the purpose of a parallel
> vacuum (for indexes) doesn't sound like a better approach than having
> a single uint8 variable.
>
> > If we don't stick to have only
> > booleans the having a ternary value for cleanup might be
> > understandable but I'm not sure it's better to have it for only vacuum
> > purpose.
> >
>
> If we want to keep the possibility of extending it for other purposes,
> then we can probably rename it to amoptions or something like that.
> What do you think?
I think it makes more sense to just keep it for the purpose of
enabling/disabling parallelism in different phases.  I am not sure
that adding more options (which are not related to enabling
parallelism in vacuum phases) to the same variable will make sense.
So I think the current name is good for its purpose.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Robert Haas
Date:
On Sun, Dec 29, 2019 at 4:23 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> IMO there's not much reason for the leader not to participate. For
> regular queries the leader may be doing useful stuff (essentially
> running the non-parallel part of the query) but AFAIK for VAUCUM that's
> not the case and the worker is not doing anything.

I agree, and said the same thing in
http://postgr.es/m/CA+Tgmob7JLrngeHz6i60_TqdvE1YBcvGYVoEQ6_xvP=vN7DwGg@mail.gmail.com

I really don't know why we have that code.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Fri, Jan 3, 2020 at 10:15 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Sun, Dec 29, 2019 at 4:23 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
> > IMO there's not much reason for the leader not to participate. For
> > regular queries the leader may be doing useful stuff (essentially
> > running the non-parallel part of the query) but AFAIK for VAUCUM that's
> > not the case and the worker is not doing anything.
>
> I agree, and said the same thing in
> http://postgr.es/m/CA+Tgmob7JLrngeHz6i60_TqdvE1YBcvGYVoEQ6_xvP=vN7DwGg@mail.gmail.com
>
> I really don't know why we have that code.
>

We have removed that code from the main patch.  It is in a separate
patch and used mainly for development testing where we want to
debug/test the worker code.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh Thalor
Date:
On Sat, 4 Jan 2020 at 07:12, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jan 3, 2020 at 10:15 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Sun, Dec 29, 2019 at 4:23 PM Tomas Vondra
> > <tomas.vondra@2ndquadrant.com> wrote:
> > > IMO there's not much reason for the leader not to participate. For
> > > regular queries the leader may be doing useful stuff (essentially
> > > running the non-parallel part of the query) but AFAIK for VAUCUM that's
> > > not the case and the worker is not doing anything.
> >
> > I agree, and said the same thing in
> > http://postgr.es/m/CA+Tgmob7JLrngeHz6i60_TqdvE1YBcvGYVoEQ6_xvP=vN7DwGg@mail.gmail.com
> >
> > I really don't know why we have that code.
> >
>
> We have removed that code from the main patch.  It is in a separate
> patch and used mainly for development testing where we want to
> debug/test the worker code.
>

Hi All,

In other thread "parallel vacuum options/syntax" [1], Amit Kapila asked opinion about syntax for making normal vacuum to parallel.  From that thread, I can see that people are in favor of option(b) to implement.  So I tried to implement option(b) on the top of v41 patch set and implemented a delta patch.

How vacuum will work?

If user gave "vacuum" or "vacuum table_name", then based on the number of parallel supported indexes, we will launch workers.
Ex: vacuum table_name;
or vacuum (parallel) table_name;    //both are same.

If user has requested parallel degree (1-1024), then we will launch workers based on requested degree and parallel supported  indexes.
Ex: vacuum (parallel 8) table_name;

If user don't want parallel vacuum, then he should set parallel degree as zero.
Ex: vacuum (parallel 0) table_name;

I did some testing also and didn't find any issue after forcing normal vacuum to parallel vacuum.  All the test cases are passing and make check world also passing.

Here, I am attaching delta patch that can be applied on the top of v41 patch set. Apart from delta patch, attaching gist index patch (v4) and all the v41 patch set.

Please let me know your thoughts for this.

[1] : https://www.postgresql.org/message-id/CAA4eK1LBUfVQu7jCfL20MAF%2BRzUssP06mcBEcSZb8XktD7X1BA%40mail.gmail.com

--
Thanks and Regards
Mahendra Singh Thalor

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Sat, Jan 4, 2020 at 6:48 PM Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
>
> Hi All,
>
> In other thread "parallel vacuum options/syntax" [1], Amit Kapila asked opinion about syntax for making normal vacuum
toparallel.  From that thread, I can see that people are in favor of option(b) to implement.  So I tried to implement
option(b)on the top of v41 patch set and implemented a delta patch. 
>

I looked at your code and changed it slightly to allow the vacuum to
be performed in parallel by default.  Apart from that, I have made a
few other modifications (a) changed the macro SizeOfLVDeadTuples as
preferred by Tomas [1], (b) updated the documentation, (c) changed a
few comments.

The first two patches are the same.  I have not posted the patch
related to the FAST option as I am not sure we have a consensus for
that and I have also intentionally left DISABLE_LEADER_PARTICIPATION
related patch to avoid confusion.

What do you think of the attached?  Sawada-san, kindly verify the
changes and let me know your opinion.

[1] - https://www.postgresql.org/message-id/20191229212354.tqivttn23lxjg2jz%40development

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Wed, 8 Jan 2020 at 22:16, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Jan 4, 2020 at 6:48 PM Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> >
> > Hi All,
> >
> > In other thread "parallel vacuum options/syntax" [1], Amit Kapila asked opinion about syntax for making normal
vacuumto parallel.  From that thread, I can see that people are in favor of option(b) to implement.  So I tried to
implementoption(b) on the top of v41 patch set and implemented a delta patch. 
> >
>
> I looked at your code and changed it slightly to allow the vacuum to
> be performed in parallel by default.  Apart from that, I have made a
> few other modifications (a) changed the macro SizeOfLVDeadTuples as
> preferred by Tomas [1], (b) updated the documentation, (c) changed a
> few comments.

Thanks.

>
> The first two patches are the same.  I have not posted the patch
> related to the FAST option as I am not sure we have a consensus for
> that and I have also intentionally left DISABLE_LEADER_PARTICIPATION
> related patch to avoid confusion.
>
> What do you think of the attached?  Sawada-san, kindly verify the
> changes and let me know your opinion.

I agreed to not include both the FAST option patch and
DISABLE_LEADER_PARTICIPATION patch at this stage. It's better to focus
on the main part and we can discuss and add them later if want.

I've looked at the latest version patch you shared. Overall it looks
good and works fine. I have a few small comments:

1.
+      refer to <xref linkend="vacuum-phases"/>).  If the
+      <literal>PARALLEL</literal>option or parallel degree

A space is needed between </literal> and 'option'.

2.
+       /*
+        * Variables to control parallel index vacuuming.  We have a bitmap to
+        * indicate which index has stats in shared memory.  The set bit in the
+        * map indicates that the particular index supports a parallel vacuum.
+        */
+       pg_atomic_uint32 idx;           /* counter for vacuuming and clean up */
+       pg_atomic_uint32 nprocessed;    /* # of indexes done during parallel
+
  * execution */
+       uint32          offset;                 /* sizeof header incl. bitmap */
+       bits8           bitmap[FLEXIBLE_ARRAY_MEMBER];  /* bit map of NULLs */
+
+       /* Shared index statistics data follows at end of struct */
+} LVShared;

It seems to me that we no longer use nprocessed at all. So we can remove it.

3.
+ * Compute the number of parallel worker processes to request.  Both index
+ * vacuuming and index cleanup can be executed with parallel workers.  The
+ * relation sizes of table don't affect to the parallel degree for now.

I think the last sentence should be "The relation size of table
doesn't affect to the parallel degree for now".

4.
+       /* cap by max_parallel_maintenance_workers */
+       parallel_workers = Min(parallel_workers,
max_parallel_maintenance_workers);

+       /*
+        * a parallel vacuum must be requested and there must be indexes on the
+        * relation
+        */

+       /* copy the updated statistics */

+       /* parallel vacuum must be active */
+       Assert(VacuumSharedCostBalance);

All comments that the patches newly added except for the above four
places start with a capital letter. Maybe we can change them for
consistency.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Thu, Jan 9, 2020 at 10:41 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Wed, 8 Jan 2020 at 22:16, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > What do you think of the attached?  Sawada-san, kindly verify the
> > changes and let me know your opinion.
>
> I agreed to not include both the FAST option patch and
> DISABLE_LEADER_PARTICIPATION patch at this stage. It's better to focus
> on the main part and we can discuss and add them later if want.
>
> I've looked at the latest version patch you shared. Overall it looks
> good and works fine. I have a few small comments:
>

I have addressed all your comments and slightly change nearby comments
and ran pgindent.  I think we can commit the first two preparatory
patches now unless you or someone else has any more comments on those.
Tomas, most of your comments were in the main patch
(v43-0002-Allow-vacuum-command-to-process-indexes-in-parallel) which
are now addressed and we have provided the reasons for the proposed
API changes in patch
v43-0001-Introduce-IndexAM-fields-for-parallel-vacuum.  Do you have
any concerns if we commit the API patch and then in a few days time
(after another pass or two) commit the main patch?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Sergei Kornilov
Date:
Hello

I noticed that parallel vacuum uses min_parallel_index_scan_size GUC to skip small indexes but this is not mentioned in
documentationfor both vacuum command and GUC itself.
 

+    /* Determine the number of parallel workers to launch */
+    if (lps->lvshared->for_cleanup)
+    {
+        if (lps->lvshared->first_time)
+            nworkers = lps->nindexes_parallel_cleanup +
+                lps->nindexes_parallel_condcleanup - 1;
+        else
+            nworkers = lps->nindexes_parallel_cleanup - 1;
+
+    }
+    else
+        nworkers = lps->nindexes_parallel_bulkdel - 1;

(lazy_parallel_vacuum_indexes)
Perhaps we need to add a comment for future readers, why we reduce the number of workers by 1. Maybe this would be
cleaner?

+    /* Determine the number of parallel workers to launch */
+    if (lps->lvshared->for_cleanup)
+    {
+        if (lps->lvshared->first_time)
+            nworkers = lps->nindexes_parallel_cleanup +
+                lps->nindexes_parallel_condcleanup;
+        else
+            nworkers = lps->nindexes_parallel_cleanup;
+
+    }
+    else
+        nworkers = lps->nindexes_parallel_bulkdel;
+
+   /* The leader process will participate */
+   nworkers--;

I have no more comments after reading the patches.

regards, Sergei



Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh Thalor
Date:
On Thu, 9 Jan 2020 at 17:31, Sergei Kornilov <sk@zsrv.org> wrote:
>
> Hello
>
> I noticed that parallel vacuum uses min_parallel_index_scan_size GUC to skip small indexes but this is not mentioned
indocumentation for both vacuum command and GUC itself.
 
>
> +       /* Determine the number of parallel workers to launch */
> +       if (lps->lvshared->for_cleanup)
> +       {
> +               if (lps->lvshared->first_time)
> +                       nworkers = lps->nindexes_parallel_cleanup +
> +                               lps->nindexes_parallel_condcleanup - 1;
> +               else
> +                       nworkers = lps->nindexes_parallel_cleanup - 1;
> +
> +       }
> +       else
> +               nworkers = lps->nindexes_parallel_bulkdel - 1;

v43-0001-Introduce-IndexAM-fields-for-parallel-vacuum and
v43-0001-Introduce-IndexAM-fields-for-parallel-vacuum patches look
fine to me.

Below are some review comments for v43-0002 patch.

1.
+    <term><replaceable class="parameter">integer</replaceable></term>
+    <listitem>
+     <para>
+      Specifies a positive integer value passed to the selected option.
+      The <replaceable class="parameter">integer</replaceable> value can
+      also be omitted, in which case the value is decided by the command
+      based on the option used.
+     </para>
+    </listitem

I think, now we are supporting zero also as a degree, so it should be
changed from "positive integer" to "positive integer(including zero)"

2.
+ * with parallel worker processes.  Individual indexes are processed by one
+ * vacuum process.  At the beginning of a lazy vacuum (at lazy_scan_heap) we

I think, above sentence should be like "Each individual index is
processed by one vacuum process." or one worker

3.
+ * Lazy vacuum supports parallel execution with parallel worker processes.  In
+ * a parallel lazy vacuum, we perform both index vacuuming and index cleanup

Here, "index vacuuming" should be changed to "index vacuum" or "index
cleanup" to "index cleaning"

Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Thu, 9 Jan 2020 at 19:33, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Jan 9, 2020 at 10:41 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Wed, 8 Jan 2020 at 22:16, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > >
> > > What do you think of the attached?  Sawada-san, kindly verify the
> > > changes and let me know your opinion.
> >
> > I agreed to not include both the FAST option patch and
> > DISABLE_LEADER_PARTICIPATION patch at this stage. It's better to focus
> > on the main part and we can discuss and add them later if want.
> >
> > I've looked at the latest version patch you shared. Overall it looks
> > good and works fine. I have a few small comments:
> >
>
> I have addressed all your comments and slightly change nearby comments
> and ran pgindent.  I think we can commit the first two preparatory
> patches now unless you or someone else has any more comments on those.

Yes.

I'd like to briefly summarize the
v43-0002-Allow-vacuum-command-to-process-indexes-in-parallel for other
reviewers who wants to newly starts to review this patch:

Introduce PARALLEL option to VACUUM command. Parallel vacuum is
enabled by default. The number of parallel workers is determined based
on the number of indexes that support parallel index when user didn't
specify the parallel degree or PARALLEL option is omitted. Specifying
PARALLEL 0 disables parallel vacuum.

In parallel vacuum of this patch, only the leader process does heap
scan and collect dead tuple TIDs on the DSM segment. Before starting
index vacuum or index cleanup the leader launches the parallel workers
and perform it together with parallel workers. Individual index are
processed by one vacuum worker process. Therefore parallel vacuum can
be used when the table has at least 2 indexes (the leader always takes
one index). After launched parallel workers, the leader process
vacuums indexes first that don't support parallel index after launched
parallel workers. The parallel workers process indexes that support
parallel index vacuum and the leader process join as a worker after
completing such indexes. Once all indexes are processed the parallel
worker processes exit.  After that, the leader process re-initializes
the parallel context so that it can use the same DSM for multiple
passes of index vacuum and for performing index cleanup.  For updating
the index statistics, we need to update the system table and since
updates are not allowed during parallel mode we update the index
statistics after exiting from the parallel mode.

When the vacuum cost-based delay is enabled, even parallel vacuum is
throttled. The basic idea of a cost-based vacuum delay for parallel
index vacuuming is to allow all parallel vacuum workers including the
leader process to have a shared view of cost related parameters
(mainly VacuumCostBalance). We allow each worker to update it as and
when it has incurred any cost and then based on that decide whether it
needs to sleep.  We allow the worker to sleep proportional to the work
done and reduce the VacuumSharedCostBalance by the amount which is
consumed by the current worker (VacuumCostBalanceLocal).  This can
avoid letting the workers sleep who have done less or no I/O as
compared to other workers and therefore can ensure that workers who
are doing more I/O got throttled more.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Thu, Jan 9, 2020 at 5:31 PM Sergei Kornilov <sk@zsrv.org> wrote:
>
> Hello
>
> I noticed that parallel vacuum uses min_parallel_index_scan_size GUC to skip small indexes but this is not mentioned
indocumentation for both vacuum command and GUC itself.
 
>

Changed documentation at both places.

> +       /* Determine the number of parallel workers to launch */
> +       if (lps->lvshared->for_cleanup)
> +       {
> +               if (lps->lvshared->first_time)
> +                       nworkers = lps->nindexes_parallel_cleanup +
> +                               lps->nindexes_parallel_condcleanup - 1;
> +               else
> +                       nworkers = lps->nindexes_parallel_cleanup - 1;
> +
> +       }
> +       else
> +               nworkers = lps->nindexes_parallel_bulkdel - 1;
>
> (lazy_parallel_vacuum_indexes)
> Perhaps we need to add a comment for future readers, why we reduce the number of workers by 1. Maybe this would be
cleaner?
>

Adapted your suggestion.

>
> I have no more comments after reading the patches.
>

Thank you for reviewing the patch.

> 1.
> +    <term><replaceable class="parameter">integer</replaceable></term>
> +    <listitem>
> +     <para>
> +      Specifies a positive integer value passed to the selected option.
> +      The <replaceable class="parameter">integer</replaceable> value can
> +      also be omitted, in which case the value is decided by the command
> +      based on the option used.
> +     </para>
> +    </listitem
>
> I think, now we are supporting zero also as a degree, so it should be
> changed from "positive integer" to "positive integer(including zero)"
>

I have replaced it with "non-negative integer .."

> 2.
> + * with parallel worker processes.  Individual indexes are processed by one
> + * vacuum process.  At the beginning of a lazy vacuum (at lazy_scan_heap) we
>
> I think, above sentence should be like "Each individual index is
> processed by one vacuum process." or one worker
>

Hmm, in the above sentence vacuum process refers to either a leader or
worker process, so not sure if what you are suggesting is an
improvement over current.

> 3.
> + * Lazy vacuum supports parallel execution with parallel worker processes.  In
> + * a parallel lazy vacuum, we perform both index vacuuming and index cleanup
>
> Here, "index vacuuming" should be changed to "index vacuum" or "index
> cleanup" to "index cleaning"
>

Okay, changed at the place you mentioned and other places where
similar change is required.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Sergei Kornilov
Date:
Hi
Thank you for update! I looked again

(vacuum_indexes_leader)
+        /* Skip the indexes that can be processed by parallel workers */
+        if (!skip_index)
+            continue;

Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel?

Another question about behavior on temporary tables. Use case: the user commands just "vacuum;" to vacuum entire
database(and has enough maintenance workers). Vacuum starts fine in parallel, but on first temporary table we hit:
 

+    if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0)
+    {
+        ereport(WARNING,
+                (errmsg("disabling parallel option of vacuum on \"%s\" --- cannot vacuum temporary tables in
parallel",
+                        RelationGetRelationName(onerel))));
+        params->nworkers = -1;
+    }

And therefore we turn off the parallel vacuum for the remaining tables... Can we improve this case?

regards, Sergei



Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh Thalor
Date:
On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote:
>
> Hi
> Thank you for update! I looked again
>
> (vacuum_indexes_leader)
> +               /* Skip the indexes that can be processed by parallel workers */
> +               if (!skip_index)
> +                       continue;
>
> Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel?

I also agree with your point.

>
> Another question about behavior on temporary tables. Use case: the user commands just "vacuum;" to vacuum entire
database(and has enough maintenance workers). Vacuum starts fine in parallel, but on first temporary table we hit:
 
>
> +       if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0)
> +       {
> +               ereport(WARNING,
> +                               (errmsg("disabling parallel option of vacuum on \"%s\" --- cannot vacuum temporary
tablesin parallel",
 
> +                                               RelationGetRelationName(onerel))));
> +               params->nworkers = -1;
> +       }
>
> And therefore we turn off the parallel vacuum for the remaining tables... Can we improve this case?

Good point.
Yes, we should improve this. I tried to fix this.  Attaching a delta
patch that is fixing both the comments.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Sergei Kornilov
Date:
Hello

> Yes, we should improve this. I tried to fix this. Attaching a delta
> patch that is fixing both the comments.

Thank you, I have no objections.

I think that status of CF entry is outdated and the most appropriate status for this patch is "Ready to Commiter".
Changed.I also added an annotation with a link to recently summarized results.
 

regards, Sergei



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:


On Fri, 10 Jan 2020 at 20:54, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
>
> On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote:
> >
> > Hi
> > Thank you for update! I looked again
> >
> > (vacuum_indexes_leader)
> > +               /* Skip the indexes that can be processed by parallel workers */
> > +               if (!skip_index)
> > +                       continue;
> >
> > Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel?
>
> I also agree with your point.

I don't think the change is a good idea.

-               bool            skip_index = (get_indstats(lps->lvshared, i) == NULL ||
-                                                                 skip_parallel_vacuum_index(Irel[i], lps->lvshared));
+               bool            can_parallel = (get_indstats(lps->lvshared, i) == NULL ||
+                                                                       skip_parallel_vacuum_index(Irel[i],
+                                                                                                                          lps->lvshared));

The above condition is true when the index can *not* do parallel index vacuum. How about changing it to skipped_index and change the comment to something like “We are interested in only index skipped parallel vacuum”?

>
> >
> > Another question about behavior on temporary tables. Use case: the user commands just "vacuum;" to vacuum entire database (and has enough maintenance workers). Vacuum starts fine in parallel, but on first temporary table we hit:
> >
> > +       if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0)
> > +       {
> > +               ereport(WARNING,
> > +                               (errmsg("disabling parallel option of vacuum on \"%s\" --- cannot vacuum temporary tables in parallel",
> > +                                               RelationGetRelationName(onerel))));
> > +               params->nworkers = -1;
> > +       }
> >
> > And therefore we turn off the parallel vacuum for the remaining tables... Can we improve this case?
>
> Good point.
> Yes, we should improve this. I tried to fix this.

+1

Regards,


--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Sat, Jan 11, 2020 at 9:23 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Fri, 10 Jan 2020 at 20:54, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> >
> > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote:
> > >
> > > Hi
> > > Thank you for update! I looked again
> > >
> > > (vacuum_indexes_leader)
> > > +               /* Skip the indexes that can be processed by parallel workers */
> > > +               if (!skip_index)
> > > +                       continue;
> > >
> > > Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel?
> >
> > I also agree with your point.
>
> I don't think the change is a good idea.
>
> -               bool            skip_index = (get_indstats(lps->lvshared, i) == NULL ||
> -                                                                 skip_parallel_vacuum_index(Irel[i],
lps->lvshared));
> +               bool            can_parallel = (get_indstats(lps->lvshared, i) == NULL ||
> +                                                                       skip_parallel_vacuum_index(Irel[i],
> +
    lps->lvshared)); 
>
> The above condition is true when the index can *not* do parallel index vacuum. How about changing it to skipped_index
andchange the comment to something like “We are interested in only index skipped parallel vacuum”? 
>

Hmm, I find the current code and comment better than what you or
Sergei are proposing.  I am not sure what is the point of confusion in
the current code?

> >
> > >
> > > Another question about behavior on temporary tables. Use case: the user commands just "vacuum;" to vacuum entire
database(and has enough maintenance workers). Vacuum starts fine in parallel, but on first temporary table we hit: 
> > >
> > > +       if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0)
> > > +       {
> > > +               ereport(WARNING,
> > > +                               (errmsg("disabling parallel option of vacuum on \"%s\" --- cannot vacuum
temporarytables in parallel", 
> > > +                                               RelationGetRelationName(onerel))));
> > > +               params->nworkers = -1;
> > > +       }
> > >
> > > And therefore we turn off the parallel vacuum for the remaining tables... Can we improve this case?
> >
> > Good point.
> > Yes, we should improve this. I tried to fix this.
>
> +1
>

Yeah, we can improve the situation here.  I think we don't need to
change the value of params->nworkers at first place if allow
lazy_scan_heap to take care of this.  Also, I think we shouldn't
display warning unless the user has explicitly asked for parallel
option.  See the fix in the attached patch.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Sat, 11 Jan 2020 at 13:18, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Jan 11, 2020 at 9:23 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Fri, 10 Jan 2020 at 20:54, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> > >
> > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote:
> > > >
> > > > Hi
> > > > Thank you for update! I looked again
> > > >
> > > > (vacuum_indexes_leader)
> > > > +               /* Skip the indexes that can be processed by parallel workers */
> > > > +               if (!skip_index)
> > > > +                       continue;
> > > >
> > > > Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel?
> > >
> > > I also agree with your point.
> >
> > I don't think the change is a good idea.
> >
> > -               bool            skip_index = (get_indstats(lps->lvshared, i) == NULL ||
> > -                                                                 skip_parallel_vacuum_index(Irel[i],
lps->lvshared));
> > +               bool            can_parallel = (get_indstats(lps->lvshared, i) == NULL ||
> > +                                                                       skip_parallel_vacuum_index(Irel[i],
> > +
      lps->lvshared)); 
> >
> > The above condition is true when the index can *not* do parallel index vacuum. How about changing it to
skipped_indexand change the comment to something like “We are interested in only index skipped parallel vacuum”? 
> >
>
> Hmm, I find the current code and comment better than what you or
> Sergei are proposing.  I am not sure what is the point of confusion in
> the current code?

Yeah the current code is also good. I just thought they were concerned
that the variable name skip_index might be confusing because we skip
if skip_index is NOT true.

>
> > >
> > > >
> > > > Another question about behavior on temporary tables. Use case: the user commands just "vacuum;" to vacuum
entiredatabase (and has enough maintenance workers). Vacuum starts fine in parallel, but on first temporary table we
hit:
> > > >
> > > > +       if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0)
> > > > +       {
> > > > +               ereport(WARNING,
> > > > +                               (errmsg("disabling parallel option of vacuum on \"%s\" --- cannot vacuum
temporarytables in parallel", 
> > > > +                                               RelationGetRelationName(onerel))));
> > > > +               params->nworkers = -1;
> > > > +       }
> > > >
> > > > And therefore we turn off the parallel vacuum for the remaining tables... Can we improve this case?
> > >
> > > Good point.
> > > Yes, we should improve this. I tried to fix this.
> >
> > +1
> >
>
> Yeah, we can improve the situation here.  I think we don't need to
> change the value of params->nworkers at first place if allow
> lazy_scan_heap to take care of this.  Also, I think we shouldn't
> display warning unless the user has explicitly asked for parallel
> option.  See the fix in the attached patch.

Agreed. But with the updated patch the PARALLEL option without the
parallel degree doesn't display warning because params->nworkers = 0
in that case. So how about restoring params->nworkers at the end of
vacuum_rel()?

+                       /*
+                        * Give warning only if the user explicitly
tries to perform a
+                        * parallel vacuum on the temporary table.
+                        */
+                       if (params->nworkers > 0)
+                               ereport(WARNING,
+                                               (errmsg("disabling
parallel option of vacuum on \"%s\" --- cannot vacuum temporary tables
in parallel",
+
RelationGetRelationName(onerel))));

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh Thalor
Date:
On Sat, 11 Jan 2020 at 19:48, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
>
> On Sat, 11 Jan 2020 at 13:18, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sat, Jan 11, 2020 at 9:23 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Fri, 10 Jan 2020 at 20:54, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> > > >
> > > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote:
> > > > >
> > > > > Hi
> > > > > Thank you for update! I looked again
> > > > >
> > > > > (vacuum_indexes_leader)
> > > > > +               /* Skip the indexes that can be processed by parallel workers */
> > > > > +               if (!skip_index)
> > > > > +                       continue;
> > > > >
> > > > > Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel?
> > > >
> > > > I also agree with your point.
> > >
> > > I don't think the change is a good idea.
> > >
> > > -               bool            skip_index = (get_indstats(lps->lvshared, i) == NULL ||
> > > -                                                                 skip_parallel_vacuum_index(Irel[i], lps->lvshared));
> > > +               bool            can_parallel = (get_indstats(lps->lvshared, i) == NULL ||
> > > +                                                                       skip_parallel_vacuum_index(Irel[i],
> > > +                                                                                                                          lps->lvshared));
> > >
> > > The above condition is true when the index can *not* do parallel index vacuum. How about changing it to skipped_index and change the comment to something like “We are interested in only index skipped parallel vacuum”?
> > >
> >
> > Hmm, I find the current code and comment better than what you or
> > Sergei are proposing.  I am not sure what is the point of confusion in
> > the current code?
>
> Yeah the current code is also good. I just thought they were concerned
> that the variable name skip_index might be confusing because we skip
> if skip_index is NOT true.
>
> >
> > > >
> > > > >
> > > > > Another question about behavior on temporary tables. Use case: the user commands just "vacuum;" to vacuum entire database (and has enough maintenance workers). Vacuum starts fine in parallel, but on first temporary table we hit:
> > > > >
> > > > > +       if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0)
> > > > > +       {
> > > > > +               ereport(WARNING,
> > > > > +                               (errmsg("disabling parallel option of vacuum on \"%s\" --- cannot vacuum temporary tables in parallel",
> > > > > +                                               RelationGetRelationName(onerel))));
> > > > > +               params->nworkers = -1;
> > > > > +       }
> > > > >
> > > > > And therefore we turn off the parallel vacuum for the remaining tables... Can we improve this case?
> > > >
> > > > Good point.
> > > > Yes, we should improve this. I tried to fix this.
> > >
> > > +1
> > >
> >
> > Yeah, we can improve the situation here.  I think we don't need to
> > change the value of params->nworkers at first place if allow
> > lazy_scan_heap to take care of this.  Also, I think we shouldn't
> > display warning unless the user has explicitly asked for parallel
> > option.  See the fix in the attached patch.
>
> Agreed. But with the updated patch the PARALLEL option without the
> parallel degree doesn't display warning because params->nworkers = 0
> in that case. So how about restoring params->nworkers at the end of
> vacuum_rel()?
>
> +                       /*
> +                        * Give warning only if the user explicitly
> tries to perform a
> +                        * parallel vacuum on the temporary table.
> +                        */
> +                       if (params->nworkers > 0)
> +                               ereport(WARNING,
> +                                               (errmsg("disabling
> parallel option of vacuum on \"%s\" --- cannot vacuum temporary tables
> in parallel",
> +
> RelationGetRelationName(onerel))));

Hi,
I have some doubts for warning of temporary tables . Below are the some examples.

Let we have 1 temporary table with name "temp_table".
Case 1:
vacuum;
I think, in this case, we should not give any warning for temp table. We should do parallel vacuum(considering zero as parallel degree) for all the tables except temporary tables.

Case 2:
vacuum (parallel);

Case 3:
vacuum(parallel 5);

Case 4:
vacuum(parallel) temp_table;

Case 5:
vacuum(parallel 4) temp_table;

I think, for case 2 and 4, as per new design, we should give error (ERROR: Parallel degree should be specified between 0 to 1024) because by default, parallel vacuum is ON, so if user give parallel option without degree, then we can give error.
If we can give error for case 2 and 4, then we can give warning for case 3, 5.

Thoughts?

--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Sat, Jan 11, 2020 at 7:48 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Sat, 11 Jan 2020 at 13:18, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sat, Jan 11, 2020 at 9:23 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Fri, 10 Jan 2020 at 20:54, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> > > >
> > > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote:
> > > > >
> > > > > Hi
> > > > > Thank you for update! I looked again
> > > > >
> > > > > (vacuum_indexes_leader)
> > > > > +               /* Skip the indexes that can be processed by parallel workers */
> > > > > +               if (!skip_index)
> > > > > +                       continue;
> > > > >
> > > > > Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel?
> > > >
> > > > I also agree with your point.
> > >
> > > I don't think the change is a good idea.
> > >
> > > -               bool            skip_index = (get_indstats(lps->lvshared, i) == NULL ||
> > > -                                                                 skip_parallel_vacuum_index(Irel[i],
lps->lvshared));
> > > +               bool            can_parallel = (get_indstats(lps->lvshared, i) == NULL ||
> > > +                                                                       skip_parallel_vacuum_index(Irel[i],
> > > +
        lps->lvshared)); 
> > >
> > > The above condition is true when the index can *not* do parallel index vacuum. How about changing it to
skipped_indexand change the comment to something like “We are interested in only index skipped parallel vacuum”? 
> > >
> >
> > Hmm, I find the current code and comment better than what you or
> > Sergei are proposing.  I am not sure what is the point of confusion in
> > the current code?
>
> Yeah the current code is also good. I just thought they were concerned
> that the variable name skip_index might be confusing because we skip
> if skip_index is NOT true.
>

Okay, would it better if we get rid of this variable and have code like below?

/* Skip the indexes that can be processed by parallel workers */
if ( !(get_indstats(lps->lvshared, i) == NULL ||
skip_parallel_vacuum_index(Irel[i], lps->lvshared)))
    continue;
...

> >
> > > >
> > > > >
> > > > > Another question about behavior on temporary tables. Use case: the user commands just "vacuum;" to vacuum
entiredatabase (and has enough maintenance workers). Vacuum starts fine in parallel, but on first temporary table we
hit:
> > > > >
> > > > > +       if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0)
> > > > > +       {
> > > > > +               ereport(WARNING,
> > > > > +                               (errmsg("disabling parallel option of vacuum on \"%s\" --- cannot vacuum
temporarytables in parallel", 
> > > > > +                                               RelationGetRelationName(onerel))));
> > > > > +               params->nworkers = -1;
> > > > > +       }
> > > > >
> > > > > And therefore we turn off the parallel vacuum for the remaining tables... Can we improve this case?
> > > >
> > > > Good point.
> > > > Yes, we should improve this. I tried to fix this.
> > >
> > > +1
> > >
> >
> > Yeah, we can improve the situation here.  I think we don't need to
> > change the value of params->nworkers at first place if allow
> > lazy_scan_heap to take care of this.  Also, I think we shouldn't
> > display warning unless the user has explicitly asked for parallel
> > option.  See the fix in the attached patch.
>
> Agreed. But with the updated patch the PARALLEL option without the
> parallel degree doesn't display warning because params->nworkers = 0
> in that case. So how about restoring params->nworkers at the end of
> vacuum_rel()?
>

I had also thought on those lines, but I was not entirely sure about
this resetting of workers.  Today, again thinking about it, it seems
the idea Mahendra is suggesting that is giving an error if the
parallel degree is not specified seems reasonable to me.  This means
Vacuum (parallel), Vacuum (parallel) <tbl_name>, etc. will give an
error "parallel degree must be specified".  This idea has merit as now
we are supporting a parallel vacuum by default, so a 'parallel' option
without a parallel degree doesn't have any meaning.  If we do that,
then we don't need to do anything additional about the handling of
temp tables (other than what patch is already doing) as well.  What do
you think?



--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Mon, Jan 13, 2020 at 9:20 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Jan 11, 2020 at 7:48 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Sat, 11 Jan 2020 at 13:18, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Sat, Jan 11, 2020 at 9:23 AM Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > On Fri, 10 Jan 2020 at 20:54, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> > > > >
> > > > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote:
> > > > > >
> > > > > > Hi
> > > > > > Thank you for update! I looked again
> > > > > >
> > > > > > (vacuum_indexes_leader)
> > > > > > +               /* Skip the indexes that can be processed by parallel workers */
> > > > > > +               if (!skip_index)
> > > > > > +                       continue;
> > > > > >
> > > > > > Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel?
> > > > >
> > > > > I also agree with your point.
> > > >
> > > > I don't think the change is a good idea.
> > > >
> > > > -               bool            skip_index = (get_indstats(lps->lvshared, i) == NULL ||
> > > > -                                                                 skip_parallel_vacuum_index(Irel[i],
lps->lvshared));
> > > > +               bool            can_parallel = (get_indstats(lps->lvshared, i) == NULL ||
> > > > +                                                                       skip_parallel_vacuum_index(Irel[i],
> > > > +
          lps->lvshared)); 
> > > >
> > > > The above condition is true when the index can *not* do parallel index vacuum. How about changing it to
skipped_indexand change the comment to something like “We are interested in only index skipped parallel vacuum”? 
> > > >
> > >
> > > Hmm, I find the current code and comment better than what you or
> > > Sergei are proposing.  I am not sure what is the point of confusion in
> > > the current code?
> >
> > Yeah the current code is also good. I just thought they were concerned
> > that the variable name skip_index might be confusing because we skip
> > if skip_index is NOT true.
> >
>
> Okay, would it better if we get rid of this variable and have code like below?
>
> /* Skip the indexes that can be processed by parallel workers */
> if ( !(get_indstats(lps->lvshared, i) == NULL ||
> skip_parallel_vacuum_index(Irel[i], lps->lvshared)))
>     continue;
> ...
>
> > >
> > > > >
> > > > > >
> > > > > > Another question about behavior on temporary tables. Use case: the user commands just "vacuum;" to vacuum
entiredatabase (and has enough maintenance workers). Vacuum starts fine in parallel, but on first temporary table we
hit:
> > > > > >
> > > > > > +       if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0)
> > > > > > +       {
> > > > > > +               ereport(WARNING,
> > > > > > +                               (errmsg("disabling parallel option of vacuum on \"%s\" --- cannot vacuum
temporarytables in parallel", 
> > > > > > +                                               RelationGetRelationName(onerel))));
> > > > > > +               params->nworkers = -1;
> > > > > > +       }
> > > > > >
> > > > > > And therefore we turn off the parallel vacuum for the remaining tables... Can we improve this case?
> > > > >
> > > > > Good point.
> > > > > Yes, we should improve this. I tried to fix this.
> > > >
> > > > +1
> > > >
> > >
> > > Yeah, we can improve the situation here.  I think we don't need to
> > > change the value of params->nworkers at first place if allow
> > > lazy_scan_heap to take care of this.  Also, I think we shouldn't
> > > display warning unless the user has explicitly asked for parallel
> > > option.  See the fix in the attached patch.
> >
> > Agreed. But with the updated patch the PARALLEL option without the
> > parallel degree doesn't display warning because params->nworkers = 0
> > in that case. So how about restoring params->nworkers at the end of
> > vacuum_rel()?
> >
>
> I had also thought on those lines, but I was not entirely sure about
> this resetting of workers.  Today, again thinking about it, it seems
> the idea Mahendra is suggesting that is giving an error if the
> parallel degree is not specified seems reasonable to me.  This means
> Vacuum (parallel), Vacuum (parallel) <tbl_name>, etc. will give an
> error "parallel degree must be specified".  This idea has merit as now
> we are supporting a parallel vacuum by default, so a 'parallel' option
> without a parallel degree doesn't have any meaning.  If we do that,
> then we don't need to do anything additional about the handling of
> temp tables (other than what patch is already doing) as well.  What do
> you think?
>
This idea make sense to me.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Sergei Kornilov
Date:
Hello

> I just thought they were concerned
> that the variable name skip_index might be confusing because we skip
> if skip_index is NOT true.

Right.

>>  > - bool skip_index = (get_indstats(lps->lvshared, i) == NULL ||
>>  > - skip_parallel_vacuum_index(Irel[i], lps->lvshared));
>>  > + bool can_parallel = (get_indstats(lps->lvshared, i) == NULL ||
>>  > + skip_parallel_vacuum_index(Irel[i],
>>  > + lps->lvshared));
>>  >
>>  > The above condition is true when the index can *not* do parallel index vacuum.

Ouch, right. I was wrong. (or the variable name and the comment really confused me)

> Okay, would it better if we get rid of this variable and have code like below?
>
> /* Skip the indexes that can be processed by parallel workers */
> if ( !(get_indstats(lps->lvshared, i) == NULL ||
> skip_parallel_vacuum_index(Irel[i], lps->lvshared)))
>     continue;

Complex condition... Not sure.

> How about changing it to skipped_index and change the comment to something like “We are interested in only index
skippedparallel vacuum”?
 

I prefer this idea.

> Today, again thinking about it, it seems
> the idea Mahendra is suggesting that is giving an error if the
> parallel degree is not specified seems reasonable to me.

+1

regards, Sergei



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Thu, Jan 9, 2020 at 4:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Jan 9, 2020 at 10:41 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Wed, 8 Jan 2020 at 22:16, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > >
> > > What do you think of the attached?  Sawada-san, kindly verify the
> > > changes and let me know your opinion.
> >
> > I agreed to not include both the FAST option patch and
> > DISABLE_LEADER_PARTICIPATION patch at this stage. It's better to focus
> > on the main part and we can discuss and add them later if want.
> >
> > I've looked at the latest version patch you shared. Overall it looks
> > good and works fine. I have a few small comments:
> >
>
> I have addressed all your comments and slightly change nearby comments
> and ran pgindent.  I think we can commit the first two preparatory
> patches now unless you or someone else has any more comments on those.
>

I have pushed the first one (4e514c6) and I am planning to commit the
next one (API: v46-0001-Introduce-IndexAM-fields-for-parallel-vacuum)
patch on Wednesday.  We are still discussing a few things for the main
parallel vacuum patch
(v46-0002-Allow-vacuum-command-to-process-indexes-in-parallel) which
we should reach conclusion soon. In the attached, I have made a few
changes in the comments of patch
v46-0002-Allow-vacuum-command-to-process-indexes-in-parallel.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh Thalor
Date:
On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote:
>
> Hi
> Thank you for update! I looked again
>
> (vacuum_indexes_leader)
> +               /* Skip the indexes that can be processed by parallel workers */
> +               if (!skip_index)
> +                       continue;
>
> Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel?
>

Again I looked into code and thought that somehow if we can add a
boolean flag(can_parallel)  in IndexBulkDeleteResult structure to
identify that this index is supporting parallel vacuum or not, then it
will be easy to skip those indexes and multiple time we will not call
skip_parallel_vacuum_index (from vacuum_indexes_leader and
parallel_vacuum_index)
We can have a linked list of non-parallel supported indexes, then
directly we can pass to vacuum_indexes_leader.

Ex: let suppose we have 5 indexes into a table.  If before launching
parallel workers, if we can add boolean flag(can_parallel)
IndexBulkDeleteResult structure to identify that this index is
supporting parallel vacuum or not.
Let index 1, 4 are not supporting parallel vacuum so we already have
info in a linked list that 1->4 are not supporting parallel vacuum, so
parallel_vacuum_index will process these indexes and rest will be
processed by parallel workers. If parallel worker found that
can_parallel is false, then it will skip that index.

As per my understanding, if we implement this, then we can avoid
multiple function calling of skip_parallel_vacuum_index and if there
is no index which can't  performe parallel vacuum, then we will not
call vacuum_indexes_leader as head of list pointing to null. (we can
save unnecessary calling of vacuum_indexes_leader)

Thoughts?

--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Mon, 13 Jan 2020 at 12:50, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Jan 11, 2020 at 7:48 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Sat, 11 Jan 2020 at 13:18, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Sat, Jan 11, 2020 at 9:23 AM Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > On Fri, 10 Jan 2020 at 20:54, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> > > > >
> > > > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote:
> > > > > >
> > > > > > Hi
> > > > > > Thank you for update! I looked again
> > > > > >
> > > > > > (vacuum_indexes_leader)
> > > > > > +               /* Skip the indexes that can be processed by parallel workers */
> > > > > > +               if (!skip_index)
> > > > > > +                       continue;
> > > > > >
> > > > > > Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel?
> > > > >
> > > > > I also agree with your point.
> > > >
> > > > I don't think the change is a good idea.
> > > >
> > > > -               bool            skip_index = (get_indstats(lps->lvshared, i) == NULL ||
> > > > -                                                                 skip_parallel_vacuum_index(Irel[i],
lps->lvshared));
> > > > +               bool            can_parallel = (get_indstats(lps->lvshared, i) == NULL ||
> > > > +                                                                       skip_parallel_vacuum_index(Irel[i],
> > > > +
          lps->lvshared)); 
> > > >
> > > > The above condition is true when the index can *not* do parallel index vacuum. How about changing it to
skipped_indexand change the comment to something like “We are interested in only index skipped parallel vacuum”? 
> > > >
> > >
> > > Hmm, I find the current code and comment better than what you or
> > > Sergei are proposing.  I am not sure what is the point of confusion in
> > > the current code?
> >
> > Yeah the current code is also good. I just thought they were concerned
> > that the variable name skip_index might be confusing because we skip
> > if skip_index is NOT true.
> >
>
> Okay, would it better if we get rid of this variable and have code like below?
>
> /* Skip the indexes that can be processed by parallel workers */
> if ( !(get_indstats(lps->lvshared, i) == NULL ||
> skip_parallel_vacuum_index(Irel[i], lps->lvshared)))
>     continue;

Make sense to me.

> ...
>
> > >
> > > > >
> > > > > >
> > > > > > Another question about behavior on temporary tables. Use case: the user commands just "vacuum;" to vacuum
entiredatabase (and has enough maintenance workers). Vacuum starts fine in parallel, but on first temporary table we
hit:
> > > > > >
> > > > > > +       if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0)
> > > > > > +       {
> > > > > > +               ereport(WARNING,
> > > > > > +                               (errmsg("disabling parallel option of vacuum on \"%s\" --- cannot vacuum
temporarytables in parallel", 
> > > > > > +                                               RelationGetRelationName(onerel))));
> > > > > > +               params->nworkers = -1;
> > > > > > +       }
> > > > > >
> > > > > > And therefore we turn off the parallel vacuum for the remaining tables... Can we improve this case?
> > > > >
> > > > > Good point.
> > > > > Yes, we should improve this. I tried to fix this.
> > > >
> > > > +1
> > > >
> > >
> > > Yeah, we can improve the situation here.  I think we don't need to
> > > change the value of params->nworkers at first place if allow
> > > lazy_scan_heap to take care of this.  Also, I think we shouldn't
> > > display warning unless the user has explicitly asked for parallel
> > > option.  See the fix in the attached patch.
> >
> > Agreed. But with the updated patch the PARALLEL option without the
> > parallel degree doesn't display warning because params->nworkers = 0
> > in that case. So how about restoring params->nworkers at the end of
> > vacuum_rel()?
> >
>
> I had also thought on those lines, but I was not entirely sure about
> this resetting of workers.  Today, again thinking about it, it seems
> the idea Mahendra is suggesting that is giving an error if the
> parallel degree is not specified seems reasonable to me.  This means
> Vacuum (parallel), Vacuum (parallel) <tbl_name>, etc. will give an
> error "parallel degree must be specified".  This idea has merit as now
> we are supporting a parallel vacuum by default, so a 'parallel' option
> without a parallel degree doesn't have any meaning.  If we do that,
> then we don't need to do anything additional about the handling of
> temp tables (other than what patch is already doing) as well.  What do
> you think?
>

Good point! Agreed.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, 14 Jan 2020 at 03:20, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
>
> On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote:
> >
> > Hi
> > Thank you for update! I looked again
> >
> > (vacuum_indexes_leader)
> > +               /* Skip the indexes that can be processed by parallel workers */
> > +               if (!skip_index)
> > +                       continue;
> >
> > Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel?
> >
>
> Again I looked into code and thought that somehow if we can add a
> boolean flag(can_parallel)  in IndexBulkDeleteResult structure to
> identify that this index is supporting parallel vacuum or not, then it
> will be easy to skip those indexes and multiple time we will not call
> skip_parallel_vacuum_index (from vacuum_indexes_leader and
> parallel_vacuum_index)
> We can have a linked list of non-parallel supported indexes, then
> directly we can pass to vacuum_indexes_leader.
>
> Ex: let suppose we have 5 indexes into a table.  If before launching
> parallel workers, if we can add boolean flag(can_parallel)
> IndexBulkDeleteResult structure to identify that this index is
> supporting parallel vacuum or not.
> Let index 1, 4 are not supporting parallel vacuum so we already have
> info in a linked list that 1->4 are not supporting parallel vacuum, so
> parallel_vacuum_index will process these indexes and rest will be
> processed by parallel workers. If parallel worker found that
> can_parallel is false, then it will skip that index.
>
> As per my understanding, if we implement this, then we can avoid
> multiple function calling of skip_parallel_vacuum_index and if there
> is no index which can't  performe parallel vacuum, then we will not
> call vacuum_indexes_leader as head of list pointing to null. (we can
> save unnecessary calling of vacuum_indexes_leader)
>
> Thoughts?
>

We skip not only indexes that don't support parallel index vacuum but
also indexes supporting it depending on vacuum phase. That is, we
could skip different indexes at different vacuum phase. Therefore with
your idea, we would need to have at least three linked lists for each
possible vacuum phase(bulkdelete, conditional cleanup and cleanup), is
that right?

I think we can check if there are indexes that should be processed by
the leader process before entering the loop in vacuum_indexes_leader
by comparing nindexes_parallel_XXX of LVParallelState to the number of
indexes but I'm not sure it's effective since the number of indexes on
a table should be small.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh Thalor
Date:
On Tue, 14 Jan 2020 at 10:06, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Tue, 14 Jan 2020 at 03:20, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> >
> > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote:
> > >
> > > Hi
> > > Thank you for update! I looked again
> > >
> > > (vacuum_indexes_leader)
> > > +               /* Skip the indexes that can be processed by parallel workers */
> > > +               if (!skip_index)
> > > +                       continue;
> > >
> > > Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel?
> > >
> >
> > Again I looked into code and thought that somehow if we can add a
> > boolean flag(can_parallel)  in IndexBulkDeleteResult structure to
> > identify that this index is supporting parallel vacuum or not, then it
> > will be easy to skip those indexes and multiple time we will not call
> > skip_parallel_vacuum_index (from vacuum_indexes_leader and
> > parallel_vacuum_index)
> > We can have a linked list of non-parallel supported indexes, then
> > directly we can pass to vacuum_indexes_leader.
> >
> > Ex: let suppose we have 5 indexes into a table.  If before launching
> > parallel workers, if we can add boolean flag(can_parallel)
> > IndexBulkDeleteResult structure to identify that this index is
> > supporting parallel vacuum or not.
> > Let index 1, 4 are not supporting parallel vacuum so we already have
> > info in a linked list that 1->4 are not supporting parallel vacuum, so
> > parallel_vacuum_index will process these indexes and rest will be
> > processed by parallel workers. If parallel worker found that
> > can_parallel is false, then it will skip that index.
> >
> > As per my understanding, if we implement this, then we can avoid
> > multiple function calling of skip_parallel_vacuum_index and if there
> > is no index which can't  performe parallel vacuum, then we will not
> > call vacuum_indexes_leader as head of list pointing to null. (we can
> > save unnecessary calling of vacuum_indexes_leader)
> >
> > Thoughts?
> >
>
> We skip not only indexes that don't support parallel index vacuum but
> also indexes supporting it depending on vacuum phase. That is, we
> could skip different indexes at different vacuum phase. Therefore with
> your idea, we would need to have at least three linked lists for each
> possible vacuum phase(bulkdelete, conditional cleanup and cleanup), is
> that right?
>
> I think we can check if there are indexes that should be processed by
> the leader process before entering the loop in vacuum_indexes_leader
> by comparing nindexes_parallel_XXX of LVParallelState to the number of
> indexes but I'm not sure it's effective since the number of indexes on
> a table should be small.
>

Hi,

+    /*
+     * Try to initialize the parallel vacuum if requested
+     */
+    if (params->nworkers >= 0 && vacrelstats->useindex)
+    {
+        /*
+         * Since parallel workers cannot access data in temporary tables, we
+         * can't perform parallel vacuum on them.
+         */
+        if (RelationUsesLocalBuffers(onerel))
+        {
+            /*
+             * Give warning only if the user explicitly tries to perform a
+             * parallel vacuum on the temporary table.
+             */
+            if (params->nworkers > 0)
+                ereport(WARNING,
+                        (errmsg("disabling parallel option of vacuum
on \"%s\" --- cannot vacuum temporary tables in parallel",

From v45 patch, we moved warning of temporary table into
"params->nworkers >= 0 && vacrelstats->useindex)" check so if table
don't have any index, then we are not giving any warning. I think, we
should give warning for all the temporary tables if parallel degree is
given. (Till v44 patch, we were giving warning for all the temporary
tables(having index and without index))

Thoughts?

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh Thalor
Date:
On Tue, 14 Jan 2020 at 16:17, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
>
> On Tue, 14 Jan 2020 at 10:06, Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Tue, 14 Jan 2020 at 03:20, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> > >
> > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote:
> > > >
> > > > Hi
> > > > Thank you for update! I looked again
> > > >
> > > > (vacuum_indexes_leader)
> > > > +               /* Skip the indexes that can be processed by parallel workers */
> > > > +               if (!skip_index)
> > > > +                       continue;
> > > >
> > > > Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel?
> > > >
> > >
> > > Again I looked into code and thought that somehow if we can add a
> > > boolean flag(can_parallel)  in IndexBulkDeleteResult structure to
> > > identify that this index is supporting parallel vacuum or not, then it
> > > will be easy to skip those indexes and multiple time we will not call
> > > skip_parallel_vacuum_index (from vacuum_indexes_leader and
> > > parallel_vacuum_index)
> > > We can have a linked list of non-parallel supported indexes, then
> > > directly we can pass to vacuum_indexes_leader.
> > >
> > > Ex: let suppose we have 5 indexes into a table.  If before launching
> > > parallel workers, if we can add boolean flag(can_parallel)
> > > IndexBulkDeleteResult structure to identify that this index is
> > > supporting parallel vacuum or not.
> > > Let index 1, 4 are not supporting parallel vacuum so we already have
> > > info in a linked list that 1->4 are not supporting parallel vacuum, so
> > > parallel_vacuum_index will process these indexes and rest will be
> > > processed by parallel workers. If parallel worker found that
> > > can_parallel is false, then it will skip that index.
> > >
> > > As per my understanding, if we implement this, then we can avoid
> > > multiple function calling of skip_parallel_vacuum_index and if there
> > > is no index which can't  performe parallel vacuum, then we will not
> > > call vacuum_indexes_leader as head of list pointing to null. (we can
> > > save unnecessary calling of vacuum_indexes_leader)
> > >
> > > Thoughts?
> > >
> >
> > We skip not only indexes that don't support parallel index vacuum but
> > also indexes supporting it depending on vacuum phase. That is, we
> > could skip different indexes at different vacuum phase. Therefore with
> > your idea, we would need to have at least three linked lists for each
> > possible vacuum phase(bulkdelete, conditional cleanup and cleanup), is
> > that right?
> >
> > I think we can check if there are indexes that should be processed by
> > the leader process before entering the loop in vacuum_indexes_leader
> > by comparing nindexes_parallel_XXX of LVParallelState to the number of
> > indexes but I'm not sure it's effective since the number of indexes on
> > a table should be small.
> >
>
> Hi,
>
> +    /*
> +     * Try to initialize the parallel vacuum if requested
> +     */
> +    if (params->nworkers >= 0 && vacrelstats->useindex)
> +    {
> +        /*
> +         * Since parallel workers cannot access data in temporary tables, we
> +         * can't perform parallel vacuum on them.
> +         */
> +        if (RelationUsesLocalBuffers(onerel))
> +        {
> +            /*
> +             * Give warning only if the user explicitly tries to perform a
> +             * parallel vacuum on the temporary table.
> +             */
> +            if (params->nworkers > 0)
> +                ereport(WARNING,
> +                        (errmsg("disabling parallel option of vacuum
> on \"%s\" --- cannot vacuum temporary tables in parallel",
>
> From v45 patch, we moved warning of temporary table into
> "params->nworkers >= 0 && vacrelstats->useindex)" check so if table
> don't have any index, then we are not giving any warning. I think, we
> should give warning for all the temporary tables if parallel degree is
> given. (Till v44 patch, we were giving warning for all the temporary
> tables(having index and without index))
>
> Thoughts?

Hi,
I did some more review.  Below is the 1 review comment for v46-0002.

+    /*
+     * Initialize the state for parallel vacuum
+     */
+    if (params->nworkers >= 0 && vacrelstats->useindex)
+    {
+        /*
+         * Since parallel workers cannot access data in temporary tables, we
+         * can't perform parallel vacuum on them.
+         */
+        if (RelationUsesLocalBuffers(onerel)

In above check, we should add "nindexes > 1" check so that if there is only 1 index, then we will not call begin_parallel_vacuum.

"Initialize the state for parallel vacuum",we can improve this comment by mentioning that what are doing here. (If table has more than index and parallel vacuum is requested, then try to start parallel vacuum).

--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Tue, Jan 14, 2020 at 10:04 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Mon, 13 Jan 2020 at 12:50, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sat, Jan 11, 2020 at 7:48 PM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > Okay, would it better if we get rid of this variable and have code like below?
> >
> > /* Skip the indexes that can be processed by parallel workers */
> > if ( !(get_indstats(lps->lvshared, i) == NULL ||
> > skip_parallel_vacuum_index(Irel[i], lps->lvshared)))
> >     continue;
>
> Make sense to me.
>

I have changed the comment and condition to make it a positive test so
that it is more clear.

> > ...
> > > Agreed. But with the updated patch the PARALLEL option without the
> > > parallel degree doesn't display warning because params->nworkers = 0
> > > in that case. So how about restoring params->nworkers at the end of
> > > vacuum_rel()?
> > >
> >
> > I had also thought on those lines, but I was not entirely sure about
> > this resetting of workers.  Today, again thinking about it, it seems
> > the idea Mahendra is suggesting that is giving an error if the
> > parallel degree is not specified seems reasonable to me.  This means
> > Vacuum (parallel), Vacuum (parallel) <tbl_name>, etc. will give an
> > error "parallel degree must be specified".  This idea has merit as now
> > we are supporting a parallel vacuum by default, so a 'parallel' option
> > without a parallel degree doesn't have any meaning.  If we do that,
> > then we don't need to do anything additional about the handling of
> > temp tables (other than what patch is already doing) as well.  What do
> > you think?
> >
>
> Good point! Agreed.
>

Thanks, changed accordingly.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Tue, Jan 14, 2020 at 4:17 PM Mahendra Singh Thalor
<mahi6run@gmail.com> wrote:
>
> Hi,
>
> +    /*
> +     * Try to initialize the parallel vacuum if requested
> +     */
> +    if (params->nworkers >= 0 && vacrelstats->useindex)
> +    {
> +        /*
> +         * Since parallel workers cannot access data in temporary tables, we
> +         * can't perform parallel vacuum on them.
> +         */
> +        if (RelationUsesLocalBuffers(onerel))
> +        {
> +            /*
> +             * Give warning only if the user explicitly tries to perform a
> +             * parallel vacuum on the temporary table.
> +             */
> +            if (params->nworkers > 0)
> +                ereport(WARNING,
> +                        (errmsg("disabling parallel option of vacuum
> on \"%s\" --- cannot vacuum temporary tables in parallel",
>
> From v45 patch, we moved warning of temporary table into
> "params->nworkers >= 0 && vacrelstats->useindex)" check so if table
> don't have any index, then we are not giving any warning. I think, we
> should give warning for all the temporary tables if parallel degree is
> given. (Till v44 patch, we were giving warning for all the temporary
> tables(having index and without index))
>

I am not sure how useful it is to give WARNING in this case as we are
anyway not going to perform a parallel vacuum because it doesn't have
an index?  One can also say that WARNING is expected in the cases
where we skip a parallel vacuum due to any reason (ex., if the size of
the index is small), but I don't think that will be a good idea.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh Thalor
Date:
On Tue, 14 Jan 2020 at 17:16, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
>
> On Tue, 14 Jan 2020 at 16:17, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> >
> > On Tue, 14 Jan 2020 at 10:06, Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Tue, 14 Jan 2020 at 03:20, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> > > >
> > > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote:
> > > > >
> > > > > Hi
> > > > > Thank you for update! I looked again
> > > > >
> > > > > (vacuum_indexes_leader)
> > > > > +               /* Skip the indexes that can be processed by parallel workers */
> > > > > +               if (!skip_index)
> > > > > +                       continue;
> > > > >
> > > > > Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel?
> > > > >
> > > >
> > > > Again I looked into code and thought that somehow if we can add a
> > > > boolean flag(can_parallel)  in IndexBulkDeleteResult structure to
> > > > identify that this index is supporting parallel vacuum or not, then it
> > > > will be easy to skip those indexes and multiple time we will not call
> > > > skip_parallel_vacuum_index (from vacuum_indexes_leader and
> > > > parallel_vacuum_index)
> > > > We can have a linked list of non-parallel supported indexes, then
> > > > directly we can pass to vacuum_indexes_leader.
> > > >
> > > > Ex: let suppose we have 5 indexes into a table.  If before launching
> > > > parallel workers, if we can add boolean flag(can_parallel)
> > > > IndexBulkDeleteResult structure to identify that this index is
> > > > supporting parallel vacuum or not.
> > > > Let index 1, 4 are not supporting parallel vacuum so we already have
> > > > info in a linked list that 1->4 are not supporting parallel vacuum, so
> > > > parallel_vacuum_index will process these indexes and rest will be
> > > > processed by parallel workers. If parallel worker found that
> > > > can_parallel is false, then it will skip that index.
> > > >
> > > > As per my understanding, if we implement this, then we can avoid
> > > > multiple function calling of skip_parallel_vacuum_index and if there
> > > > is no index which can't  performe parallel vacuum, then we will not
> > > > call vacuum_indexes_leader as head of list pointing to null. (we can
> > > > save unnecessary calling of vacuum_indexes_leader)
> > > >
> > > > Thoughts?
> > > >
> > >
> > > We skip not only indexes that don't support parallel index vacuum but
> > > also indexes supporting it depending on vacuum phase. That is, we
> > > could skip different indexes at different vacuum phase. Therefore with
> > > your idea, we would need to have at least three linked lists for each
> > > possible vacuum phase(bulkdelete, conditional cleanup and cleanup), is
> > > that right?
> > >
> > > I think we can check if there are indexes that should be processed by
> > > the leader process before entering the loop in vacuum_indexes_leader
> > > by comparing nindexes_parallel_XXX of LVParallelState to the number of
> > > indexes but I'm not sure it's effective since the number of indexes on
> > > a table should be small.
> > >
> >
> > Hi,
> >
> > +    /*
> > +     * Try to initialize the parallel vacuum if requested
> > +     */
> > +    if (params->nworkers >= 0 && vacrelstats->useindex)
> > +    {
> > +        /*
> > +         * Since parallel workers cannot access data in temporary tables, we
> > +         * can't perform parallel vacuum on them.
> > +         */
> > +        if (RelationUsesLocalBuffers(onerel))
> > +        {
> > +            /*
> > +             * Give warning only if the user explicitly tries to perform a
> > +             * parallel vacuum on the temporary table.
> > +             */
> > +            if (params->nworkers > 0)
> > +                ereport(WARNING,
> > +                        (errmsg("disabling parallel option of vacuum
> > on \"%s\" --- cannot vacuum temporary tables in parallel",
> >
> > From v45 patch, we moved warning of temporary table into
> > "params->nworkers >= 0 && vacrelstats->useindex)" check so if table
> > don't have any index, then we are not giving any warning. I think, we
> > should give warning for all the temporary tables if parallel degree is
> > given. (Till v44 patch, we were giving warning for all the temporary
> > tables(having index and without index))
> >
> > Thoughts?
>
> Hi,
> I did some more review.  Below is the 1 review comment for v46-0002.
>
> +    /*
> +     * Initialize the state for parallel vacuum
> +     */
> +    if (params->nworkers >= 0 && vacrelstats->useindex)
> +    {
> +        /*
> +         * Since parallel workers cannot access data in temporary tables, we
> +         * can't perform parallel vacuum on them.
> +         */
> +        if (RelationUsesLocalBuffers(onerel)
>
> In above check, we should add "nindexes > 1" check so that if there is only 1 index, then we will not call
begin_parallel_vacuum.

I think, " if (params->nworkers >= 0 && nindexes > 1)" check will be
enough here .

Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, 14 Jan 2020 at 21:43, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jan 14, 2020 at 10:04 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Mon, 13 Jan 2020 at 12:50, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Sat, Jan 11, 2020 at 7:48 PM Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > Okay, would it better if we get rid of this variable and have code like below?
> > >
> > > /* Skip the indexes that can be processed by parallel workers */
> > > if ( !(get_indstats(lps->lvshared, i) == NULL ||
> > > skip_parallel_vacuum_index(Irel[i], lps->lvshared)))
> > >     continue;
> >
> > Make sense to me.
> >
>
> I have changed the comment and condition to make it a positive test so
> that it is more clear.
>
> > > ...
> > > > Agreed. But with the updated patch the PARALLEL option without the
> > > > parallel degree doesn't display warning because params->nworkers = 0
> > > > in that case. So how about restoring params->nworkers at the end of
> > > > vacuum_rel()?
> > > >
> > >
> > > I had also thought on those lines, but I was not entirely sure about
> > > this resetting of workers.  Today, again thinking about it, it seems
> > > the idea Mahendra is suggesting that is giving an error if the
> > > parallel degree is not specified seems reasonable to me.  This means
> > > Vacuum (parallel), Vacuum (parallel) <tbl_name>, etc. will give an
> > > error "parallel degree must be specified".  This idea has merit as now
> > > we are supporting a parallel vacuum by default, so a 'parallel' option
> > > without a parallel degree doesn't have any meaning.  If we do that,
> > > then we don't need to do anything additional about the handling of
> > > temp tables (other than what patch is already doing) as well.  What do
> > > you think?
> > >
> >
> > Good point! Agreed.
> >
>
> Thanks, changed accordingly.
>

Thank you for updating the patch! I have a few small comments. The
rest looks good to me.

1.
+ * Compute the number of parallel worker processes to request.  Both index
+ * vacuum and index cleanup can be executed with parallel workers.  The
+ * relation size of the table don't affect the parallel degree for now.

s/don't/doesn't/

2.
@@ -383,6 +435,7 @@ vacuum(List *relations, VacuumParams *params,
        VacuumPageHit = 0;
        VacuumPageMiss = 0;
        VacuumPageDirty = 0;
+       VacuumSharedCostBalance = NULL;

I think we can initialize VacuumCostBalanceLocal and
VacuumActiveNWorkers here. We use these parameters during parallel
index vacuum and reset at the end but we might want to initialize them
for safety.

3.
+   /* Set cost-based vacuum delay */
+   VacuumCostActive = (VacuumCostDelay > 0);
+   VacuumCostBalance = 0;
+   VacuumPageHit = 0;
+   VacuumPageMiss = 0;
+   VacuumPageDirty = 0;
+   VacuumSharedCostBalance = &(lvshared->cost_balance);
+   VacuumActiveNWorkers = &(lvshared->active_nworkers);

VacuumCostBalanceLocal also needs to be initialized.

4.
The regression tests don't have the test case of PARALLEL 0.

Since I guess you already modifies the code locally I've attached the
diff containing the above review comments.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Wed, 15 Jan 2020 at 12:34, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
>
> On Tue, 14 Jan 2020 at 17:16, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> >
> > On Tue, 14 Jan 2020 at 16:17, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> > >
> > > On Tue, 14 Jan 2020 at 10:06, Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > On Tue, 14 Jan 2020 at 03:20, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> > > > >
> > > > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote:
> > > > > >
> > > > > > Hi
> > > > > > Thank you for update! I looked again
> > > > > >
> > > > > > (vacuum_indexes_leader)
> > > > > > +               /* Skip the indexes that can be processed by parallel workers */
> > > > > > +               if (!skip_index)
> > > > > > +                       continue;
> > > > > >
> > > > > > Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel?
> > > > > >
> > > > >
> > > > > Again I looked into code and thought that somehow if we can add a
> > > > > boolean flag(can_parallel)  in IndexBulkDeleteResult structure to
> > > > > identify that this index is supporting parallel vacuum or not, then it
> > > > > will be easy to skip those indexes and multiple time we will not call
> > > > > skip_parallel_vacuum_index (from vacuum_indexes_leader and
> > > > > parallel_vacuum_index)
> > > > > We can have a linked list of non-parallel supported indexes, then
> > > > > directly we can pass to vacuum_indexes_leader.
> > > > >
> > > > > Ex: let suppose we have 5 indexes into a table.  If before launching
> > > > > parallel workers, if we can add boolean flag(can_parallel)
> > > > > IndexBulkDeleteResult structure to identify that this index is
> > > > > supporting parallel vacuum or not.
> > > > > Let index 1, 4 are not supporting parallel vacuum so we already have
> > > > > info in a linked list that 1->4 are not supporting parallel vacuum, so
> > > > > parallel_vacuum_index will process these indexes and rest will be
> > > > > processed by parallel workers. If parallel worker found that
> > > > > can_parallel is false, then it will skip that index.
> > > > >
> > > > > As per my understanding, if we implement this, then we can avoid
> > > > > multiple function calling of skip_parallel_vacuum_index and if there
> > > > > is no index which can't  performe parallel vacuum, then we will not
> > > > > call vacuum_indexes_leader as head of list pointing to null. (we can
> > > > > save unnecessary calling of vacuum_indexes_leader)
> > > > >
> > > > > Thoughts?
> > > > >
> > > >
> > > > We skip not only indexes that don't support parallel index vacuum but
> > > > also indexes supporting it depending on vacuum phase. That is, we
> > > > could skip different indexes at different vacuum phase. Therefore with
> > > > your idea, we would need to have at least three linked lists for each
> > > > possible vacuum phase(bulkdelete, conditional cleanup and cleanup), is
> > > > that right?
> > > >
> > > > I think we can check if there are indexes that should be processed by
> > > > the leader process before entering the loop in vacuum_indexes_leader
> > > > by comparing nindexes_parallel_XXX of LVParallelState to the number of
> > > > indexes but I'm not sure it's effective since the number of indexes on
> > > > a table should be small.
> > > >
> > >
> > > Hi,
> > >
> > > +    /*
> > > +     * Try to initialize the parallel vacuum if requested
> > > +     */
> > > +    if (params->nworkers >= 0 && vacrelstats->useindex)
> > > +    {
> > > +        /*
> > > +         * Since parallel workers cannot access data in temporary tables, we
> > > +         * can't perform parallel vacuum on them.
> > > +         */
> > > +        if (RelationUsesLocalBuffers(onerel))
> > > +        {
> > > +            /*
> > > +             * Give warning only if the user explicitly tries to perform a
> > > +             * parallel vacuum on the temporary table.
> > > +             */
> > > +            if (params->nworkers > 0)
> > > +                ereport(WARNING,
> > > +                        (errmsg("disabling parallel option of vacuum
> > > on \"%s\" --- cannot vacuum temporary tables in parallel",
> > >
> > > From v45 patch, we moved warning of temporary table into
> > > "params->nworkers >= 0 && vacrelstats->useindex)" check so if table
> > > don't have any index, then we are not giving any warning. I think, we
> > > should give warning for all the temporary tables if parallel degree is
> > > given. (Till v44 patch, we were giving warning for all the temporary
> > > tables(having index and without index))
> > >
> > > Thoughts?
> >
> > Hi,
> > I did some more review.  Below is the 1 review comment for v46-0002.
> >
> > +    /*
> > +     * Initialize the state for parallel vacuum
> > +     */
> > +    if (params->nworkers >= 0 && vacrelstats->useindex)
> > +    {
> > +        /*
> > +         * Since parallel workers cannot access data in temporary tables, we
> > +         * can't perform parallel vacuum on them.
> > +         */
> > +        if (RelationUsesLocalBuffers(onerel)
> >
> > In above check, we should add "nindexes > 1" check so that if there is only 1 index, then we will not call
begin_parallel_vacuum.
>
> I think, " if (params->nworkers >= 0 && nindexes > 1)" check will be
> enough here .
>

Hmm I think if we removed vacrelstats->useindex from that condition we
will call begin_parallel_vacuum even when index cleanup is disabled.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Wed, Jan 15, 2020 at 10:05 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> Thank you for updating the patch! I have a few small comments.
>

I have adapted all your changes, fixed the comment by Mahendra related
to initializing parallel state only when there are at least two
indexes.  Additionally, I have changed a few comments (make the
reference to parallel vacuum consistent, at some places we were
referring it as 'parallel lazy vacuum' and at other places it was
'parallel index vacuum').

> The
> rest looks good to me.
>

Okay, I think the patch is in good shape.  I am planning to read it a
few more times (at least 2 times) and then probably will commit it
early next week (Monday or Tuesday) unless there are any major
comments.  I have already committed the API patch (4d8a8d0c73).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh Thalor
Date:
On Wed, 15 Jan 2020 at 17:27, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jan 15, 2020 at 10:05 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > Thank you for updating the patch! I have a few small comments.
> >
>
> I have adapted all your changes, fixed the comment by Mahendra related
> to initializing parallel state only when there are at least two
> indexes.  Additionally, I have changed a few comments (make the
> reference to parallel vacuum consistent, at some places we were
> referring it as 'parallel lazy vacuum' and at other places it was
> 'parallel index vacuum').
>
> > The
> > rest looks good to me.
> >
>
> Okay, I think the patch is in good shape.  I am planning to read it a
> few more times (at least 2 times) and then probably will commit it
> early next week (Monday or Tuesday) unless there are any major
> comments.  I have already committed the API patch (4d8a8d0c73).
>

Hi,
Thanks Amit for fixing review comments.

I reviewed v48 patch and below are some comments.

1.
+    * based on the number of indexes.  -1 indicates a parallel vacuum is

I think, above should be like "-1 indicates that parallel vacuum is"

2.
+/* Variables for cost-based parallel vacuum  */

At the end of comment, there is 2 spaces.  I think, it should be only 1 space.

3.
I think, we should add a test case for parallel option(when degree is not specified).
Ex:
postgres=# VACUUM (PARALLEL) tmp;
ERROR:  parallel option requires a value between 0 and 1024
LINE 1: VACUUM (PARALLEL) tmp;
                ^
postgres=#

Because above error is added in this parallel patch, so we should have test case for this to increase code coverage.

--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh Thalor
Date:
On Wed, 15 Jan 2020 at 19:04, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
>
> On Wed, 15 Jan 2020 at 17:27, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Jan 15, 2020 at 10:05 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > Thank you for updating the patch! I have a few small comments.
> > >
> >
> > I have adapted all your changes, fixed the comment by Mahendra related
> > to initializing parallel state only when there are at least two
> > indexes.  Additionally, I have changed a few comments (make the
> > reference to parallel vacuum consistent, at some places we were
> > referring it as 'parallel lazy vacuum' and at other places it was
> > 'parallel index vacuum').
> >
> > > The
> > > rest looks good to me.
> > >
> >
> > Okay, I think the patch is in good shape.  I am planning to read it a
> > few more times (at least 2 times) and then probably will commit it
> > early next week (Monday or Tuesday) unless there are any major
> > comments.  I have already committed the API patch (4d8a8d0c73).
> >
>
> Hi,
> Thanks Amit for fixing review comments.
>
> I reviewed v48 patch and below are some comments.
>
> 1.
> +    * based on the number of indexes.  -1 indicates a parallel vacuum is
>
> I think, above should be like "-1 indicates that parallel vacuum is"
>
> 2.
> +/* Variables for cost-based parallel vacuum  */
>
> At the end of comment, there is 2 spaces.  I think, it should be only 1 space.
>
> 3.
> I think, we should add a test case for parallel option(when degree is not specified).
> Ex:
> postgres=# VACUUM (PARALLEL) tmp;
> ERROR:  parallel option requires a value between 0 and 1024
> LINE 1: VACUUM (PARALLEL) tmp;
>                 ^
> postgres=#
>
> Because above error is added in this parallel patch, so we should have test case for this to increase code coverage.
>

Hi
Below are some more review comments for v48 patch.

1.
#include "storage/bufpage.h"
#include "storage/lockdefs.h"
+#include "storage/shm_toc.h"
+#include "storage/dsm.h"

Here, order of header file is not alphabetically. (storage/dsm.h
should come before storage/lockdefs.h)

2.
+    /* No index supports parallel vacuum */
+    if (nindexes_parallel == 0)
+        return 0;
+
+    /* The leader process takes one index */
+    nindexes_parallel--;

Above code can be rearranged as:

+    /* The leader process takes one index */
+    nindexes_parallel--;
+
+    /* No index supports parallel vacuum */
+    if (nindexes_parallel <= 0)
+        return 0;

If we do like this, then in some cases, we can skip some calculations
of parallel workers.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh Thalor
Date:
On Wed, 15 Jan 2020 at 19:31, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
>
> On Wed, 15 Jan 2020 at 19:04, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> >
> > On Wed, 15 Jan 2020 at 17:27, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Wed, Jan 15, 2020 at 10:05 AM Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > Thank you for updating the patch! I have a few small comments.
> > > >
> > >
> > > I have adapted all your changes, fixed the comment by Mahendra related
> > > to initializing parallel state only when there are at least two
> > > indexes.  Additionally, I have changed a few comments (make the
> > > reference to parallel vacuum consistent, at some places we were
> > > referring it as 'parallel lazy vacuum' and at other places it was
> > > 'parallel index vacuum').
> > >
> > > > The
> > > > rest looks good to me.
> > > >
> > >
> > > Okay, I think the patch is in good shape.  I am planning to read it a
> > > few more times (at least 2 times) and then probably will commit it
> > > early next week (Monday or Tuesday) unless there are any major
> > > comments.  I have already committed the API patch (4d8a8d0c73).
> > >
> >
> > Hi,
> > Thanks Amit for fixing review comments.
> >
> > I reviewed v48 patch and below are some comments.
> >
> > 1.
> > +    * based on the number of indexes.  -1 indicates a parallel vacuum is
> >
> > I think, above should be like "-1 indicates that parallel vacuum is"
> >
> > 2.
> > +/* Variables for cost-based parallel vacuum  */
> >
> > At the end of comment, there is 2 spaces.  I think, it should be only 1 space.
> >
> > 3.
> > I think, we should add a test case for parallel option(when degree is not specified).
> > Ex:
> > postgres=# VACUUM (PARALLEL) tmp;
> > ERROR:  parallel option requires a value between 0 and 1024
> > LINE 1: VACUUM (PARALLEL) tmp;
> >                 ^
> > postgres=#
> >
> > Because above error is added in this parallel patch, so we should have test case for this to increase code
coverage.
> >
>
> Hi
> Below are some more review comments for v48 patch.
>
> 1.
> #include "storage/bufpage.h"
> #include "storage/lockdefs.h"
> +#include "storage/shm_toc.h"
> +#include "storage/dsm.h"
>
> Here, order of header file is not alphabetically. (storage/dsm.h
> should come before storage/lockdefs.h)
>
> 2.
> +    /* No index supports parallel vacuum */
> +    if (nindexes_parallel == 0)
> +        return 0;
> +
> +    /* The leader process takes one index */
> +    nindexes_parallel--;
>
> Above code can be rearranged as:
>
> +    /* The leader process takes one index */
> +    nindexes_parallel--;
> +
> +    /* No index supports parallel vacuum */
> +    if (nindexes_parallel <= 0)
> +        return 0;
>
> If we do like this, then in some cases, we can skip some calculations
> of parallel workers.
>
> --
> Thanks and Regards
> Mahendra Singh Thalor
> EnterpriseDB: http://www.enterprisedb.com

Hi,
I checked code coverage and time taken by vacuum.sql test with and
without v48 patch. Below are some findings (I ran "make check-world
-i" to get coverage.)

1.
With v45 patch, compute_parallel_delay is never called so function hit
is zero. I think, we can add some delay options into vacuum.sql test
to hit function.

2.
I checked time taken by vacuum.sql test. Execution time is almost same
with and without v45 patch.

Without v45 patch:
Run1) vacuum                       ... ok 701 ms
Run2) vacuum                       ... ok 549 ms
Run3) vacuum                       ... ok 559 ms
Run4) vacuum                       ... ok 480 ms

With v45 patch:
Run1) vacuum                       ... ok 842 ms
Run2) vacuum                       ... ok 808 ms
Run3)  vacuum                       ... ok 774 ms
Run4) vacuum                       ... ok 792 ms

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Thu, Jan 16, 2020 at 1:02 AM Mahendra Singh Thalor
<mahi6run@gmail.com> wrote:
>
> On Wed, 15 Jan 2020 at 19:31, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> >
> > On Wed, 15 Jan 2020 at 19:04, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> > >
> > >
> > > I reviewed v48 patch and below are some comments.
> > >
> > > 1.
> > > +    * based on the number of indexes.  -1 indicates a parallel vacuum is
> > >
> > > I think, above should be like "-1 indicates that parallel vacuum is"
> > >

I am not an expert in this matter, but I am not sure if your
suggestion is correct.  I thought an article is required here, but I
could be wrong.  Can you please clarify?

> > > 2.
> > > +/* Variables for cost-based parallel vacuum  */
> > >
> > > At the end of comment, there is 2 spaces.  I think, it should be only 1 space.
> > >
> > > 3.
> > > I think, we should add a test case for parallel option(when degree is not specified).
> > > Ex:
> > > postgres=# VACUUM (PARALLEL) tmp;
> > > ERROR:  parallel option requires a value between 0 and 1024
> > > LINE 1: VACUUM (PARALLEL) tmp;
> > >                 ^
> > > postgres=#
> > >
> > > Because above error is added in this parallel patch, so we should have test case for this to increase code
coverage.
> > >

I thought about it but was not sure to add a test for it.  We might
not want to add a test for each and every case as that will increase
the number and time of tests without a significant advantage.  Now
that you have pointed this, I can add a test for it unless someone
else thinks otherwise.

>
> 1.
> With v45 patch, compute_parallel_delay is never called so function hit
> is zero. I think, we can add some delay options into vacuum.sql test
> to hit function.
>

But how can we meaningfully test the functionality of the delay?  It
would be tricky to come up with a portable test that can always
produce consistent results.

> 2.
> I checked time taken by vacuum.sql test. Execution time is almost same
> with and without v45 patch.
>
> Without v45 patch:
> Run1) vacuum                       ... ok 701 ms
> Run2) vacuum                       ... ok 549 ms
> Run3) vacuum                       ... ok 559 ms
> Run4) vacuum                       ... ok 480 ms
>
> With v45 patch:
> Run1) vacuum                       ... ok 842 ms
> Run2) vacuum                       ... ok 808 ms
> Run3)  vacuum                       ... ok 774 ms
> Run4) vacuum                       ... ok 792 ms
>

I see some variance in results, have you run with autovacuum as off.
I was expecting that this might speed up some cases where parallel
vacuum is used by default.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh Thalor
Date:
On Thu, 16 Jan 2020 at 08:22, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Jan 16, 2020 at 1:02 AM Mahendra Singh Thalor
> <mahi6run@gmail.com> wrote:
> >
> > On Wed, 15 Jan 2020 at 19:31, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> > >
> > > On Wed, 15 Jan 2020 at 19:04, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> > > >
> > > >
> > > > I reviewed v48 patch and below are some comments.
> > > >
> > > > 1.
> > > > +    * based on the number of indexes.  -1 indicates a parallel vacuum is
> > > >
> > > > I think, above should be like "-1 indicates that parallel vacuum is"
> > > >
>
> I am not an expert in this matter, but I am not sure if your
> suggestion is correct.  I thought an article is required here, but I
> could be wrong.  Can you please clarify?
>
> > > > 2.
> > > > +/* Variables for cost-based parallel vacuum  */
> > > >
> > > > At the end of comment, there is 2 spaces.  I think, it should be only 1 space.
> > > >
> > > > 3.
> > > > I think, we should add a test case for parallel option(when degree is not specified).
> > > > Ex:
> > > > postgres=# VACUUM (PARALLEL) tmp;
> > > > ERROR:  parallel option requires a value between 0 and 1024
> > > > LINE 1: VACUUM (PARALLEL) tmp;
> > > >                 ^
> > > > postgres=#
> > > >
> > > > Because above error is added in this parallel patch, so we should have test case for this to increase code
coverage.
> > > >
>
> I thought about it but was not sure to add a test for it.  We might
> not want to add a test for each and every case as that will increase
> the number and time of tests without a significant advantage.  Now
> that you have pointed this, I can add a test for it unless someone
> else thinks otherwise.
>
> >
> > 1.
> > With v45 patch, compute_parallel_delay is never called so function hit
> > is zero. I think, we can add some delay options into vacuum.sql test
> > to hit function.
> >
>
> But how can we meaningfully test the functionality of the delay?  It
> would be tricky to come up with a portable test that can always
> produce consistent results.
>
> > 2.
> > I checked time taken by vacuum.sql test. Execution time is almost same
> > with and without v45 patch.
> >
> > Without v45 patch:
> > Run1) vacuum                       ... ok 701 ms
> > Run2) vacuum                       ... ok 549 ms
> > Run3) vacuum                       ... ok 559 ms
> > Run4) vacuum                       ... ok 480 ms
> >
> > With v45 patch:
> > Run1) vacuum                       ... ok 842 ms
> > Run2) vacuum                       ... ok 808 ms
> > Run3)  vacuum                       ... ok 774 ms
> > Run4) vacuum                       ... ok 792 ms
> >
>
> I see some variance in results, have you run with autovacuum as off.
> I was expecting that this might speed up some cases where parallel
> vacuum is used by default.

I think, this is expected difference in timing because we are adding
some vacuum related test. I am not starting server manually(means I am
starting server with only default setting).

If we start server with default settings, then we will not hit vacuum
related test cases to parallel because size of index relation is very
small so we will not do parallel vacuum.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Thu, Jan 16, 2020 at 10:11 AM Mahendra Singh Thalor
<mahi6run@gmail.com> wrote:
>
> On Thu, 16 Jan 2020 at 08:22, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > 2.
> > > I checked time taken by vacuum.sql test. Execution time is almost same
> > > with and without v45 patch.
> > >
> > > Without v45 patch:
> > > Run1) vacuum                       ... ok 701 ms
> > > Run2) vacuum                       ... ok 549 ms
> > > Run3) vacuum                       ... ok 559 ms
> > > Run4) vacuum                       ... ok 480 ms
> > >
> > > With v45 patch:
> > > Run1) vacuum                       ... ok 842 ms
> > > Run2) vacuum                       ... ok 808 ms
> > > Run3)  vacuum                       ... ok 774 ms
> > > Run4) vacuum                       ... ok 792 ms
> > >
> >
> > I see some variance in results, have you run with autovacuum as off.
> > I was expecting that this might speed up some cases where parallel
> > vacuum is used by default.
>
> I think, this is expected difference in timing because we are adding
> some vacuum related test. I am not starting server manually(means I am
> starting server with only default setting).
>

Can you once test by setting autovacuum = off?  The autovacuum leads
to variability in test timing.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Thu, 16 Jan 2020 at 14:11, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Jan 16, 2020 at 10:11 AM Mahendra Singh Thalor
> <mahi6run@gmail.com> wrote:
> >
> > On Thu, 16 Jan 2020 at 08:22, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > > 2.
> > > > I checked time taken by vacuum.sql test. Execution time is almost same
> > > > with and without v45 patch.
> > > >
> > > > Without v45 patch:
> > > > Run1) vacuum                       ... ok 701 ms
> > > > Run2) vacuum                       ... ok 549 ms
> > > > Run3) vacuum                       ... ok 559 ms
> > > > Run4) vacuum                       ... ok 480 ms
> > > >
> > > > With v45 patch:
> > > > Run1) vacuum                       ... ok 842 ms
> > > > Run2) vacuum                       ... ok 808 ms
> > > > Run3)  vacuum                       ... ok 774 ms
> > > > Run4) vacuum                       ... ok 792 ms
> > > >
> > >
> > > I see some variance in results, have you run with autovacuum as off.
> > > I was expecting that this might speed up some cases where parallel
> > > vacuum is used by default.
> >
> > I think, this is expected difference in timing because we are adding
> > some vacuum related test. I am not starting server manually(means I am
> > starting server with only default setting).
> >
>
> Can you once test by setting autovacuum = off?  The autovacuum leads
> to variability in test timing.
>
>

I've also run the regression tests with and without the patch:

* w/o patch and autovacuum = on:  255 ms
* w/o patch and autovacuum = off: 258 ms
* w/ patch and autovacuum = on: 370 ms
* w/ patch and autovacuum = off: 375 ms

> > If we start server with default settings, then we will not hit vacuum
> > related test cases to parallel because size of index relation is very
> > small so we will not do parallel vacuum.

Right. Most indexes (all?) of tables that are used in the regression
tests are smaller than min_parallel_index_scan_size. And we set
min_parallel_index_scan_size to 0 in vacuum.sql but VACUUM would not
be speeded-up much because of the relation size. Since we instead
populate new table for parallel vacuum testing the regression test for
vacuum would take a longer time.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Thu, Jan 16, 2020 at 4:46 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> Right. Most indexes (all?) of tables that are used in the regression
> tests are smaller than min_parallel_index_scan_size. And we set
> min_parallel_index_scan_size to 0 in vacuum.sql but VACUUM would not
> be speeded-up much because of the relation size. Since we instead
> populate new table for parallel vacuum testing the regression test for
> vacuum would take a longer time.
>

Fair enough and I think it is good in a way that it won't change the
coverage of existing vacuum code.  I have fixed all the issues
reported by Mahendra and have fixed a few other cosmetic things in the
attached patch.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Prabhat Sahu
Date:
Hi all,

I would like to share my observation on this PG feature "Block-level parallel vacuum".
I have tested the earlier patch (i.e v48) with below high-level test scenarios, and those are working as expected.
  • I have played around with these GUC parameters  while testing 
max_worker_processes                                                
autovacuum = off                                                                                                                    
shared_buffers                                                          
max_parallel_workers                                                      
max_parallel_maintenance_workers                                          
min_parallel_index_scan_size
vacuum_cost_limit                                                    
vacuum_cost_delay
  • Tested the parallel vacuum with tables and Partition tables having possible datatypes and Columns having various indexes(like btree, gist, etc.) on part / full table.
  • Tested the pgbench tables data with multiple indexes created manually and ran script(vacuum_test.sql) with DMLs and VACUUM for multiple Clients, Jobs, and Time as below.
./pgbench  -c 8 -j 16 -T 900  postgres -f vacuum_test.sql
We observe the usage of parallel workers during VACUUM.
  • Ran few isolation schedule test cases(in regression) with huge data and indexes, perform DMLs -> VACUUM
  • Tested with PARTITION TABLEs -> global/local indexes ->  DMLs -> VACUUM
  • Tested with PARTITION TABLE having different TABLESPACE in different location -> global/local indexes -> DMLs -> VACUUM
  • With Changing STORAGE options for columns(as PLAIN / EXTERNAL / EXTENDED)  -> DMLs -> VACUUM
  • Create index with CONCURRENTLY option / Changing storage_parameter for index as below  -> DMLs -> VACUUM
with(buffering=auto) / with(buffering=on) / with(buffering=off) / with(fillfactor=30);
  • Tested with creating Simple and Partitioned tables ->  DMLs  -> pg_dump/pg_restore/pg_upgrade -> VACUUM
Verified the data after restore / upgrade / VACUUM.
  • Indexes on UUID-OSSP data ->  DMLs -> pg_upgrade -> VACUUM
  • Verified with various test scenarios for better performance of parallel VACUUM as compared to Non-parallel VACUUM.
         Time taken by VACUUM on PG HEAD+PATCH(with PARALLEL) <  Time taken by VACUUM on PG HEAD (without PARALLEL)

Machine configuration: (16 VCPUs / RAM: 16GB / Disk size: 640GB)
PG HEAD:
VACUUM tab1;
Time: 38915.384 ms (00:38.915)
Time: 48389.006 ms (00:48.389)
Time: 41324.223 ms (00:41.324)
Time: 37640.874 ms (00:37.641) --median
Time: 36897.325 ms (00:36.897)
Time: 36351.022 ms (00:36.351)
Time: 36198.890 ms (00:36.199)

PG HEAD + v48 Patch:
VACUUM tab1;
Time: 37051.589 ms (00:37.052)
Time: 33647.459 ms (00:33.647) --median
Time: 31580.894 ms (00:31.581)
Time: 34442.046 ms (00:34.442)
Time: 31335.960 ms (00:31.336)
Time: 34441.245 ms (00:34.441)
Time: 31159.639 ms (00:31.160)



--

With Regards,
Prabhat Kumar Sahu
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Thu, Jan 16, 2020 at 5:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Jan 16, 2020 at 4:46 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > Right. Most indexes (all?) of tables that are used in the regression
> > tests are smaller than min_parallel_index_scan_size. And we set
> > min_parallel_index_scan_size to 0 in vacuum.sql but VACUUM would not
> > be speeded-up much because of the relation size. Since we instead
> > populate new table for parallel vacuum testing the regression test for
> > vacuum would take a longer time.
> >
>
> Fair enough and I think it is good in a way that it won't change the
> coverage of existing vacuum code.  I have fixed all the issues
> reported by Mahendra and have fixed a few other cosmetic things in the
> attached patch.
>
I have few small comments.

1.
logical streaming for large in-progress transactions+
+ /* Can't perform vacuum in parallel */
+ if (parallel_workers <= 0)
+ {
+ pfree(can_parallel_vacuum);
+ return lps;
+ }

why are we checking parallel_workers <= 0, Function
compute_parallel_vacuum_workers only returns 0 or greater than 0
so isn't it better to just check if (parallel_workers == 0) ?

2.
+/*
+ * Macro to check if we are in a parallel vacuum.  If true, we are in the
+ * parallel mode and the DSM segment is initialized.
+ */
+#define ParallelVacuumIsActive(lps) (((LVParallelState *) (lps)) != NULL)

(LVParallelState *) (lps) -> this typecast is not required, just (lps)
!= NULL should be enough.

3.

+ shared->offset = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes)));
+ prepare_index_statistics(shared, can_parallel_vacuum, nindexes);
+ pg_atomic_init_u32(&(shared->idx), 0);
+ pg_atomic_init_u32(&(shared->cost_balance), 0);
+ pg_atomic_init_u32(&(shared->active_nworkers), 0);

I think it will look cleaner if we can initialize in the order they
are declared in structure.

4.
+ VacuumSharedCostBalance = &(lps->lvshared->cost_balance);
+ VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
+
+ /*
+ * Set up shared cost balance and the number of active workers for
+ * vacuum delay.
+ */
+ pg_atomic_write_u32(VacuumSharedCostBalance, VacuumCostBalance);
+ pg_atomic_write_u32(VacuumActiveNWorkers, 0);
+
+ /*
+ * The number of workers can vary between bulkdelete and cleanup
+ * phase.
+ */
+ ReinitializeParallelWorkers(lps->pcxt, nworkers);
+
+ LaunchParallelWorkers(lps->pcxt);
+
+ if (lps->pcxt->nworkers_launched > 0)
+ {
+ /*
+ * Reset the local cost values for leader backend as we have
+ * already accumulated the remaining balance of heap.
+ */
+ VacuumCostBalance = 0;
+ VacuumCostBalanceLocal = 0;
+ }
+ else
+ {
+ /*
+ * Disable shared cost balance if we are not able to launch
+ * workers.
+ */
+ VacuumSharedCostBalance = NULL;
+ VacuumActiveNWorkers = NULL;
+ }
+

I don't like the idea of first initializing the
VacuumSharedCostBalance with lps->lvshared->cost_balance and then
uninitialize if nworkers_launched is 0.
I am not sure why do we need to initialize VacuumSharedCostBalance
here? just to perform pg_atomic_write_u32(VacuumSharedCostBalance,
VacuumCostBalance);?
I think we can initialize it only if nworkers_launched > 0 then we can
get rid of the else branch completely.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Fri, Jan 17, 2020 at 9:36 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Jan 16, 2020 at 5:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Jan 16, 2020 at 4:46 PM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > Right. Most indexes (all?) of tables that are used in the regression
> > > tests are smaller than min_parallel_index_scan_size. And we set
> > > min_parallel_index_scan_size to 0 in vacuum.sql but VACUUM would not
> > > be speeded-up much because of the relation size. Since we instead
> > > populate new table for parallel vacuum testing the regression test for
> > > vacuum would take a longer time.
> > >
> >
> > Fair enough and I think it is good in a way that it won't change the
> > coverage of existing vacuum code.  I have fixed all the issues
> > reported by Mahendra and have fixed a few other cosmetic things in the
> > attached patch.
> >
> I have few small comments.
>
> 1.
> logical streaming for large in-progress transactions+
> + /* Can't perform vacuum in parallel */
> + if (parallel_workers <= 0)
> + {
> + pfree(can_parallel_vacuum);
> + return lps;
> + }
>
> why are we checking parallel_workers <= 0, Function
> compute_parallel_vacuum_workers only returns 0 or greater than 0
> so isn't it better to just check if (parallel_workers == 0) ?
>
> 2.
> +/*
> + * Macro to check if we are in a parallel vacuum.  If true, we are in the
> + * parallel mode and the DSM segment is initialized.
> + */
> +#define ParallelVacuumIsActive(lps) (((LVParallelState *) (lps)) != NULL)
>
> (LVParallelState *) (lps) -> this typecast is not required, just (lps)
> != NULL should be enough.
>
> 3.
>
> + shared->offset = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes)));
> + prepare_index_statistics(shared, can_parallel_vacuum, nindexes);
> + pg_atomic_init_u32(&(shared->idx), 0);
> + pg_atomic_init_u32(&(shared->cost_balance), 0);
> + pg_atomic_init_u32(&(shared->active_nworkers), 0);
>
> I think it will look cleaner if we can initialize in the order they
> are declared in structure.
>
> 4.
> + VacuumSharedCostBalance = &(lps->lvshared->cost_balance);
> + VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
> +
> + /*
> + * Set up shared cost balance and the number of active workers for
> + * vacuum delay.
> + */
> + pg_atomic_write_u32(VacuumSharedCostBalance, VacuumCostBalance);
> + pg_atomic_write_u32(VacuumActiveNWorkers, 0);
> +
> + /*
> + * The number of workers can vary between bulkdelete and cleanup
> + * phase.
> + */
> + ReinitializeParallelWorkers(lps->pcxt, nworkers);
> +
> + LaunchParallelWorkers(lps->pcxt);
> +
> + if (lps->pcxt->nworkers_launched > 0)
> + {
> + /*
> + * Reset the local cost values for leader backend as we have
> + * already accumulated the remaining balance of heap.
> + */
> + VacuumCostBalance = 0;
> + VacuumCostBalanceLocal = 0;
> + }
> + else
> + {
> + /*
> + * Disable shared cost balance if we are not able to launch
> + * workers.
> + */
> + VacuumSharedCostBalance = NULL;
> + VacuumActiveNWorkers = NULL;
> + }
> +
>
> I don't like the idea of first initializing the
> VacuumSharedCostBalance with lps->lvshared->cost_balance and then
> uninitialize if nworkers_launched is 0.
> I am not sure why do we need to initialize VacuumSharedCostBalance
> here? just to perform pg_atomic_write_u32(VacuumSharedCostBalance,
> VacuumCostBalance);?
> I think we can initialize it only if nworkers_launched > 0 then we can
> get rid of the else branch completely.

I missed one of my comment

+ /* Carry the shared balance value to heap scan */
+ if (VacuumSharedCostBalance)
+ VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
+
+ if (nworkers > 0)
+ {
+ /* Disable shared cost balance */
+ VacuumSharedCostBalance = NULL;
+ VacuumActiveNWorkers = NULL;
+ }

Doesn't make sense to keep them as two conditions, we can combine them as below

/* If shared costing is enable, carry the shared balance value to heap
scan and disable the shared costing */
 if (VacuumSharedCostBalance)
{
     VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
     VacuumSharedCostBalance = NULL;
     VacuumActiveNWorkers = NULL;
}

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Fri, Jan 17, 2020 at 9:36 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> I have few small comments.
>
> 1.
> logical streaming for large in-progress transactions+
> + /* Can't perform vacuum in parallel */
> + if (parallel_workers <= 0)
> + {
> + pfree(can_parallel_vacuum);
> + return lps;
> + }
>
> why are we checking parallel_workers <= 0, Function
> compute_parallel_vacuum_workers only returns 0 or greater than 0
> so isn't it better to just check if (parallel_workers == 0) ?
>

Why to have such an assumption about
compute_parallel_vacuum_workers()?  The function
compute_parallel_vacuum_workers() returns int, so such a check
(<= 0) seems reasonable to me.

> 2.
> +/*
> + * Macro to check if we are in a parallel vacuum.  If true, we are in the
> + * parallel mode and the DSM segment is initialized.
> + */
> +#define ParallelVacuumIsActive(lps) (((LVParallelState *) (lps)) != NULL)
>
> (LVParallelState *) (lps) -> this typecast is not required, just (lps)
> != NULL should be enough.
>

I think the better idea would be to just replace it PointerIsValid
like below. I see similar usage in other places.
#define ParallelVacuumIsActive(lps)  PointerIsValid(lps)

> 3.
>
> + shared->offset = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes)));
> + prepare_index_statistics(shared, can_parallel_vacuum, nindexes);
> + pg_atomic_init_u32(&(shared->idx), 0);
> + pg_atomic_init_u32(&(shared->cost_balance), 0);
> + pg_atomic_init_u32(&(shared->active_nworkers), 0);
>
> I think it will look cleaner if we can initialize in the order they
> are declared in structure.
>

Okay.

> 4.
> + VacuumSharedCostBalance = &(lps->lvshared->cost_balance);
> + VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
> +
> + /*
> + * Set up shared cost balance and the number of active workers for
> + * vacuum delay.
> + */
> + pg_atomic_write_u32(VacuumSharedCostBalance, VacuumCostBalance);
> + pg_atomic_write_u32(VacuumActiveNWorkers, 0);
> +
> + /*
> + * The number of workers can vary between bulkdelete and cleanup
> + * phase.
> + */
> + ReinitializeParallelWorkers(lps->pcxt, nworkers);
> +
> + LaunchParallelWorkers(lps->pcxt);
> +
> + if (lps->pcxt->nworkers_launched > 0)
> + {
> + /*
> + * Reset the local cost values for leader backend as we have
> + * already accumulated the remaining balance of heap.
> + */
> + VacuumCostBalance = 0;
> + VacuumCostBalanceLocal = 0;
> + }
> + else
> + {
> + /*
> + * Disable shared cost balance if we are not able to launch
> + * workers.
> + */
> + VacuumSharedCostBalance = NULL;
> + VacuumActiveNWorkers = NULL;
> + }
> +
>
> I don't like the idea of first initializing the
> VacuumSharedCostBalance with lps->lvshared->cost_balance and then
> uninitialize if nworkers_launched is 0.
> I am not sure why do we need to initialize VacuumSharedCostBalance
> here? just to perform pg_atomic_write_u32(VacuumSharedCostBalance,
> VacuumCostBalance);?
> I think we can initialize it only if nworkers_launched > 0 then we can
> get rid of the else branch completely.
>

No, we can't initialize after nworkers_launched > 0 because by that
time some workers would have already tried to access the shared cost
balance.  So, it needs to be done before launching the workers as is
done in code.  We can probably add a comment.

>
> + /* Carry the shared balance value to heap scan */
> + if (VacuumSharedCostBalance)
> + VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
> +
> + if (nworkers > 0)
> + {
> + /* Disable shared cost balance */
> + VacuumSharedCostBalance = NULL;
> + VacuumActiveNWorkers = NULL;
> + }
>
> Doesn't make sense to keep them as two conditions, we can combine them as below
>
> /* If shared costing is enable, carry the shared balance value to heap
> scan and disable the shared costing */
>  if (VacuumSharedCostBalance)
> {
>      VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
>      VacuumSharedCostBalance = NULL;
>      VacuumActiveNWorkers = NULL;
> }
>

makes sense to me, will change.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Fri, Jan 17, 2020 at 10:44 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jan 17, 2020 at 9:36 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > I have few small comments.
> >
> > 1.
> > logical streaming for large in-progress transactions+
> > + /* Can't perform vacuum in parallel */
> > + if (parallel_workers <= 0)
> > + {
> > + pfree(can_parallel_vacuum);
> > + return lps;
> > + }
> >
> > why are we checking parallel_workers <= 0, Function
> > compute_parallel_vacuum_workers only returns 0 or greater than 0
> > so isn't it better to just check if (parallel_workers == 0) ?
> >
>
> Why to have such an assumption about
> compute_parallel_vacuum_workers()?  The function
> compute_parallel_vacuum_workers() returns int, so such a check
> (<= 0) seems reasonable to me.

Okay so I should probably change my statement to why
compute_parallel_vacuum_workers is returning "int" instead of uint?  I
mean when this function is designed to return 0 or more worker why to
make it return int and then handle extra values on caller.  Am I
missing something, can it really return negative in some cases?

I find the below code in "compute_parallel_vacuum_workers" a bit confusing

+static int
+compute_parallel_vacuum_workers(Relation *Irel, int nindexes, int nrequested,
+ bool *can_parallel_vacuum)
+{
......
+ /* The leader process takes one index */
+ nindexes_parallel--;        --> nindexes_parallel can become -1
+
+ /* No index supports parallel vacuum */
+ if (nindexes_parallel == 0) .  -> Now if it is 0 then return 0 but
if its -1 then continue. seems strange no?  I think here itself we can
handle if (nindexes_parallel <= 0), that will make code cleaner.
+ return 0;
+
+ /* Compute the parallel degree */
+ parallel_workers = (nrequested > 0) ?
+ Min(nrequested, nindexes_parallel) : nindexes_parallel;



>
> > 2.
> > +/*
> > + * Macro to check if we are in a parallel vacuum.  If true, we are in the
> > + * parallel mode and the DSM segment is initialized.
> > + */
> > +#define ParallelVacuumIsActive(lps) (((LVParallelState *) (lps)) != NULL)
> >
> > (LVParallelState *) (lps) -> this typecast is not required, just (lps)
> > != NULL should be enough.
> >
>
> I think the better idea would be to just replace it PointerIsValid
> like below. I see similar usage in other places.
> #define ParallelVacuumIsActive(lps)  PointerIsValid(lps)
Make sense to me.
>
> > 3.
> >
> > + shared->offset = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes)));
> > + prepare_index_statistics(shared, can_parallel_vacuum, nindexes);
> > + pg_atomic_init_u32(&(shared->idx), 0);
> > + pg_atomic_init_u32(&(shared->cost_balance), 0);
> > + pg_atomic_init_u32(&(shared->active_nworkers), 0);
> >
> > I think it will look cleaner if we can initialize in the order they
> > are declared in structure.
> >
>
> Okay.
>
> > 4.
> > + VacuumSharedCostBalance = &(lps->lvshared->cost_balance);
> > + VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
> > +
> > + /*
> > + * Set up shared cost balance and the number of active workers for
> > + * vacuum delay.
> > + */
> > + pg_atomic_write_u32(VacuumSharedCostBalance, VacuumCostBalance);
> > + pg_atomic_write_u32(VacuumActiveNWorkers, 0);
> > +
> > + /*
> > + * The number of workers can vary between bulkdelete and cleanup
> > + * phase.
> > + */
> > + ReinitializeParallelWorkers(lps->pcxt, nworkers);
> > +
> > + LaunchParallelWorkers(lps->pcxt);
> > +
> > + if (lps->pcxt->nworkers_launched > 0)
> > + {
> > + /*
> > + * Reset the local cost values for leader backend as we have
> > + * already accumulated the remaining balance of heap.
> > + */
> > + VacuumCostBalance = 0;
> > + VacuumCostBalanceLocal = 0;
> > + }
> > + else
> > + {
> > + /*
> > + * Disable shared cost balance if we are not able to launch
> > + * workers.
> > + */
> > + VacuumSharedCostBalance = NULL;
> > + VacuumActiveNWorkers = NULL;
> > + }
> > +
> >
> > I don't like the idea of first initializing the
> > VacuumSharedCostBalance with lps->lvshared->cost_balance and then
> > uninitialize if nworkers_launched is 0.
> > I am not sure why do we need to initialize VacuumSharedCostBalance
> > here? just to perform pg_atomic_write_u32(VacuumSharedCostBalance,
> > VacuumCostBalance);?
> > I think we can initialize it only if nworkers_launched > 0 then we can
> > get rid of the else branch completely.
> >
>
> No, we can't initialize after nworkers_launched > 0 because by that
> time some workers would have already tried to access the shared cost
> balance.  So, it needs to be done before launching the workers as is
> done in code.  We can probably add a comment.
I don't think so, VacuumSharedCostBalance is a process local which is
just pointing to the shared memory variable right?

and each process has to point it to the shared memory and that we are
already doing in parallel_vacuum_main.  So we can initialize it after
worker is launched.
Basically code will look like below

pg_atomic_write_u32(&(lps->lvshared->cost_balance), VacuumCostBalance);
pg_atomic_write_u32(&(lps->lvshared->active_nworkers), 0);
 ..
ReinitializeParallelWorkers(lps->pcxt, nworkers);

LaunchParallelWorkers(lps->pcxt);

if (lps->pcxt->nworkers_launched > 0)
{
..
VacuumCostBalance = 0;
VacuumCostBalanceLocal = 0;
VacuumSharedCostBalance = &(lps->lvshared->cost_balance);
VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
}
-- remove the else part completely..

>
> >
> > + /* Carry the shared balance value to heap scan */
> > + if (VacuumSharedCostBalance)
> > + VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
> > +
> > + if (nworkers > 0)
> > + {
> > + /* Disable shared cost balance */
> > + VacuumSharedCostBalance = NULL;
> > + VacuumActiveNWorkers = NULL;
> > + }
> >
> > Doesn't make sense to keep them as two conditions, we can combine them as below
> >
> > /* If shared costing is enable, carry the shared balance value to heap
> > scan and disable the shared costing */
> >  if (VacuumSharedCostBalance)
> > {
> >      VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
> >      VacuumSharedCostBalance = NULL;
> >      VacuumActiveNWorkers = NULL;
> > }
> >
>
> makes sense to me, will change.
ok

>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Fri, Jan 17, 2020 at 11:00 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Jan 17, 2020 at 10:44 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Jan 17, 2020 at 9:36 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > I have few small comments.
> > >
> > > 1.
> > > logical streaming for large in-progress transactions+
> > > + /* Can't perform vacuum in parallel */
> > > + if (parallel_workers <= 0)
> > > + {
> > > + pfree(can_parallel_vacuum);
> > > + return lps;
> > > + }
> > >
> > > why are we checking parallel_workers <= 0, Function
> > > compute_parallel_vacuum_workers only returns 0 or greater than 0
> > > so isn't it better to just check if (parallel_workers == 0) ?
> > >
> >
> > Why to have such an assumption about
> > compute_parallel_vacuum_workers()?  The function
> > compute_parallel_vacuum_workers() returns int, so such a check
> > (<= 0) seems reasonable to me.
>
> Okay so I should probably change my statement to why
> compute_parallel_vacuum_workers is returning "int" instead of uint?
>

Hmm, I think the number of workers at most places is int, so it is
better to return int here which will keep it consistent with how we do
at other places.  See, the similar usage in compute_parallel_worker.

  I
> mean when this function is designed to return 0 or more worker why to
> make it return int and then handle extra values on caller.  Am I
> missing something, can it really return negative in some cases?
>
> I find the below code in "compute_parallel_vacuum_workers" a bit confusing
>
> +static int
> +compute_parallel_vacuum_workers(Relation *Irel, int nindexes, int nrequested,
> + bool *can_parallel_vacuum)
> +{
> ......
> + /* The leader process takes one index */
> + nindexes_parallel--;        --> nindexes_parallel can become -1
> +
> + /* No index supports parallel vacuum */
> + if (nindexes_parallel == 0) .  -> Now if it is 0 then return 0 but
> if its -1 then continue. seems strange no?  I think here itself we can
> handle if (nindexes_parallel <= 0), that will make code cleaner.
> + return 0;
> +

I think this got recently introduce by one of my changes based on the
comment by Mahendra, we can adjust this check.

> > >
> > > I don't like the idea of first initializing the
> > > VacuumSharedCostBalance with lps->lvshared->cost_balance and then
> > > uninitialize if nworkers_launched is 0.
> > > I am not sure why do we need to initialize VacuumSharedCostBalance
> > > here? just to perform pg_atomic_write_u32(VacuumSharedCostBalance,
> > > VacuumCostBalance);?
> > > I think we can initialize it only if nworkers_launched > 0 then we can
> > > get rid of the else branch completely.
> > >
> >
> > No, we can't initialize after nworkers_launched > 0 because by that
> > time some workers would have already tried to access the shared cost
> > balance.  So, it needs to be done before launching the workers as is
> > done in code.  We can probably add a comment.
> I don't think so, VacuumSharedCostBalance is a process local which is
> just pointing to the shared memory variable right?
>
> and each process has to point it to the shared memory and that we are
> already doing in parallel_vacuum_main.  So we can initialize it after
> worker is launched.
> Basically code will look like below
>
> pg_atomic_write_u32(&(lps->lvshared->cost_balance), VacuumCostBalance);
> pg_atomic_write_u32(&(lps->lvshared->active_nworkers), 0);
>

oh, I thought you were telling to initialize the shared memory itself
after launching the workers.  However, you are asking to change the
usage of the local variable, I think we can do that.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Fri, Jan 17, 2020 at 11:34 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jan 17, 2020 at 11:00 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Jan 17, 2020 at 10:44 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Fri, Jan 17, 2020 at 9:36 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > I have few small comments.
> > > >
> > > > 1.
> > > > logical streaming for large in-progress transactions+
> > > > + /* Can't perform vacuum in parallel */
> > > > + if (parallel_workers <= 0)
> > > > + {
> > > > + pfree(can_parallel_vacuum);
> > > > + return lps;
> > > > + }
> > > >
> > > > why are we checking parallel_workers <= 0, Function
> > > > compute_parallel_vacuum_workers only returns 0 or greater than 0
> > > > so isn't it better to just check if (parallel_workers == 0) ?
> > > >
> > >
> > > Why to have such an assumption about
> > > compute_parallel_vacuum_workers()?  The function
> > > compute_parallel_vacuum_workers() returns int, so such a check
> > > (<= 0) seems reasonable to me.
> >
> > Okay so I should probably change my statement to why
> > compute_parallel_vacuum_workers is returning "int" instead of uint?
> >
>
> Hmm, I think the number of workers at most places is int, so it is
> better to return int here which will keep it consistent with how we do
> at other places.  See, the similar usage in compute_parallel_worker.

Okay, I see.

>
>   I
> > mean when this function is designed to return 0 or more worker why to
> > make it return int and then handle extra values on caller.  Am I
> > missing something, can it really return negative in some cases?
> >
> > I find the below code in "compute_parallel_vacuum_workers" a bit confusing
> >
> > +static int
> > +compute_parallel_vacuum_workers(Relation *Irel, int nindexes, int nrequested,
> > + bool *can_parallel_vacuum)
> > +{
> > ......
> > + /* The leader process takes one index */
> > + nindexes_parallel--;        --> nindexes_parallel can become -1
> > +
> > + /* No index supports parallel vacuum */
> > + if (nindexes_parallel == 0) .  -> Now if it is 0 then return 0 but
> > if its -1 then continue. seems strange no?  I think here itself we can
> > handle if (nindexes_parallel <= 0), that will make code cleaner.
> > + return 0;
> > +
>
> I think this got recently introduce by one of my changes based on the
> comment by Mahendra, we can adjust this check.

Ok
>
> > > >
> > > > I don't like the idea of first initializing the
> > > > VacuumSharedCostBalance with lps->lvshared->cost_balance and then
> > > > uninitialize if nworkers_launched is 0.
> > > > I am not sure why do we need to initialize VacuumSharedCostBalance
> > > > here? just to perform pg_atomic_write_u32(VacuumSharedCostBalance,
> > > > VacuumCostBalance);?
> > > > I think we can initialize it only if nworkers_launched > 0 then we can
> > > > get rid of the else branch completely.
> > > >
> > >
> > > No, we can't initialize after nworkers_launched > 0 because by that
> > > time some workers would have already tried to access the shared cost
> > > balance.  So, it needs to be done before launching the workers as is
> > > done in code.  We can probably add a comment.
> > I don't think so, VacuumSharedCostBalance is a process local which is
> > just pointing to the shared memory variable right?
> >
> > and each process has to point it to the shared memory and that we are
> > already doing in parallel_vacuum_main.  So we can initialize it after
> > worker is launched.
> > Basically code will look like below
> >
> > pg_atomic_write_u32(&(lps->lvshared->cost_balance), VacuumCostBalance);
> > pg_atomic_write_u32(&(lps->lvshared->active_nworkers), 0);
> >
>
> oh, I thought you were telling to initialize the shared memory itself
> after launching the workers.  However, you are asking to change the
> usage of the local variable, I think we can do that.

Okay.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Fri, Jan 17, 2020 at 11:39 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have performed cost delay testing on the latest test(I have used
same script as attahced in [1] and [2].
vacuum_cost_delay = 10
vacuum_cost_limit = 2000

Observation: As we have concluded earlier, the delay time is in sync
with the I/O performed by the worker
and the total delay (heap + index) is almost the same as the
non-parallel operation.

test1:[1]

Vacuum non-parallel

WARNING:  VacuumCostTotalDelay=11332.320000

Vacuum 2 workers
WARNING:  worker 0 delay=171.085000 total io=34288 hit=22208 miss=0 dirty=604
WARNING:  worker 1 delay=87.790000 total io=17910 hit=17890 miss=0 dirty=1
WARNING:  worker 2 delay=88.620000 total io=17910 hit=17890 miss=0 dirty=1

WARNING:  VacuumCostTotalDelay=11505.650000

Vacuum 4 workers
WARNING:  worker 0 delay=87.750000 total io=17910 hit=17890 miss=0 dirty=1
WARNING:  worker 1 delay=89.155000 total io=17910 hit=17890 miss=0 dirty=1
WARNING:  worker 2 delay=87.080000 total io=17910 hit=17890 miss=0 dirty=1
WARNING:  worker 3 delay=78.745000 total io=16378 hit=4318 miss=0 dirty=603

WARNING:  VacuumCostTotalDelay=11590.680000


test2:[2]

Vacuum non-parallel
WARNING:  VacuumCostTotalDelay=22835.970000

Vacuum 2 workers
WARNING:  worker 0 delay=345.550000 total io=69338 hit=45338 miss=0 dirty=1200
WARNING:  worker 1 delay=177.150000 total io=35807 hit=35787 miss=0 dirty=1
WARNING:  worker 2 delay=178.105000 total io=35807 hit=35787 miss=0 dirty=1
WARNING:  VacuumCostTotalDelay=23191.405000


Vacuum 4 workers
WARNING:  worker 0 delay=177.265000 total io=35807 hit=35787 miss=0 dirty=1
WARNING:  worker 1 delay=177.175000 total io=35807 hit=35787 miss=0 dirty=1
WARNING:  worker 2 delay=177.385000 total io=35807 hit=35787 miss=0 dirty=1
WARNING:  worker 3 delay=166.515000 total io=33531 hit=9551 miss=0 dirty=1199
WARNING:  VacuumCostTotalDelay=23357.115000



[1] https://www.postgresql.org/message-id/CAFiTN-tFLN%3Dvdu5Ra-23E9_7Z1JXkk5MkRY3Bkj2zAoWK7fULA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAFiTN-tC%3DNcvcEd%2B5J62fR8-D8x7EHuVi2xhS-0DMf1bnJs4hw%40mail.gmail.com

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Fri, Jan 17, 2020 at 12:51 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Jan 17, 2020 at 11:39 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> I have performed cost delay testing on the latest test(I have used
> same script as attahced in [1] and [2].
> vacuum_cost_delay = 10
> vacuum_cost_limit = 2000
>
> Observation: As we have concluded earlier, the delay time is in sync
> with the I/O performed by the worker
> and the total delay (heap + index) is almost the same as the
> non-parallel operation.
>

Thanks for doing this test again.  In the attached patch, I have
addressed all the comments and modified a few comments.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh Thalor
Date:
On Fri, 17 Jan 2020 at 14:47, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jan 17, 2020 at 12:51 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Jan 17, 2020 at 11:39 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > I have performed cost delay testing on the latest test(I have used
> > same script as attahced in [1] and [2].
> > vacuum_cost_delay = 10
> > vacuum_cost_limit = 2000
> >
> > Observation: As we have concluded earlier, the delay time is in sync
> > with the I/O performed by the worker
> > and the total delay (heap + index) is almost the same as the
> > non-parallel operation.
> >
>
> Thanks for doing this test again.  In the attached patch, I have
> addressed all the comments and modified a few comments.
>

Hi,
Below are some review comments for v50 patch.

1.
+LVShared
+LVSharedIndStats
+LVParallelState
 LWLock

I think, LVParallelState should come before LVSharedIndStats.

2.
+    /*
+     * It is possible that parallel context is initialized with fewer workers
+     * then the number of indexes that need a separate worker in the current
+     * phase, so we need to consider it.  See compute_parallel_vacuum_workers.
+     */

This comment is confusing me. I think, "then" should be replaced with "than".

--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

From
Peter Geoghegan
Date:
On Fri, Jan 17, 2020 at 1:18 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> Thanks for doing this test again.  In the attached patch, I have
> addressed all the comments and modified a few comments.

I am in favor of the general idea of parallel VACUUM that parallelizes
the processing of each index (I haven't looked at the patch, though).
I observed something during a recent benchmark of the deduplication
patch that seems like it might be relevant to parallel VACUUM. This
happened during a recreation of the original WARM benchmark, which is
described here:

https://www.postgresql.org/message-id/CABOikdMNy6yowA%2BwTGK9RVd8iw%2BCzqHeQSGpW7Yka_4RSZ_LOQ%40mail.gmail.com

(There is an extra pgbench_accounts index on abalance, plus 4 indexes
on large text columns with filler MD5 hashes, all of which are
random.)

On the master branch, I can clearly observe that the "filler" MD5
indexes are bloated to a degree that is affected by the order of their
original creation/pg_class OID order. These are all indexes that
become bloated purely due to "version churn" -- or what I like to call
"unnecessary" page splits. The keys used in each pgbench_accounts
logical row never change, except in the case of the extra abalance
index (the idea is to prevent all HOT updates without ever updating
most indexed columns). I noticed that pgb_a_filler1 is a bit less
bloated than pgb_a_filler2, which is a little less bloated than
pgb_a_filler3, which is a little less bloated than pgb_a_filler4. Even
after 4 hours, and even though the "shape" of each index is identical.
This demonstrates an important general principle about vacuuming
indexes: timeliness can matter a lot.

In general, a big benefit of the deduplication patch is that it "buys
time" for VACUUM to run before "unnecessary" page splits can occur --
that is why the deduplication patch prevents *all* page splits in
these "filler" indexes, whereas on the master branch the filler
indexes are about 2x larger (the exact amount varies based on VACUUM
processing order, at least earlier on).

For tables with several indexes, giving each index its own VACUUM
worker process will prevent "unnecessary" page splits caused by
version churn, simply because VACUUM will start to clean each index
sooner than it would compared to serial processing (except for the
"lucky" first index). There is no "lucky" first index that gets
preferential treatment -- presumably VACUUM will start processing each
index at the same time with this patch, making each index equally
"lucky".

I think that there may even be a *complementary* effect with parallel
VACUUM, though I haven't tested that theory. Deduplication "buys time"
for VACUUM to run, while at the same time VACUUM takes less time to
show up and prevent "unnecessary" page splits. My guess is that these
two seemingly unrelated patches may actually address this "unnecessary
page split" problem from two completely different angles, with an
overall effect that is greater than the sum of its parts.

While the difference in size of each filler index on the master branch
wasn't that significant on its own, it's still interesting. It's
probably quite workload dependent.

-- 
Peter Geoghegan



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Sun, Jan 19, 2020 at 2:15 AM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Fri, Jan 17, 2020 at 1:18 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Thanks for doing this test again.  In the attached patch, I have
> > addressed all the comments and modified a few comments.
>
> I am in favor of the general idea of parallel VACUUM that parallelizes
> the processing of each index (I haven't looked at the patch, though).
> I observed something during a recent benchmark of the deduplication
> patch that seems like it might be relevant to parallel VACUUM. This
> happened during a recreation of the original WARM benchmark, which is
> described here:
>
> https://www.postgresql.org/message-id/CABOikdMNy6yowA%2BwTGK9RVd8iw%2BCzqHeQSGpW7Yka_4RSZ_LOQ%40mail.gmail.com
>
> (There is an extra pgbench_accounts index on abalance, plus 4 indexes
> on large text columns with filler MD5 hashes, all of which are
> random.)
>
> On the master branch, I can clearly observe that the "filler" MD5
> indexes are bloated to a degree that is affected by the order of their
> original creation/pg_class OID order. These are all indexes that
> become bloated purely due to "version churn" -- or what I like to call
> "unnecessary" page splits. The keys used in each pgbench_accounts
> logical row never change, except in the case of the extra abalance
> index (the idea is to prevent all HOT updates without ever updating
> most indexed columns). I noticed that pgb_a_filler1 is a bit less
> bloated than pgb_a_filler2, which is a little less bloated than
> pgb_a_filler3, which is a little less bloated than pgb_a_filler4. Even
> after 4 hours, and even though the "shape" of each index is identical.
> This demonstrates an important general principle about vacuuming
> indexes: timeliness can matter a lot.
>
> In general, a big benefit of the deduplication patch is that it "buys
> time" for VACUUM to run before "unnecessary" page splits can occur --
> that is why the deduplication patch prevents *all* page splits in
> these "filler" indexes, whereas on the master branch the filler
> indexes are about 2x larger (the exact amount varies based on VACUUM
> processing order, at least earlier on).
>
> For tables with several indexes, giving each index its own VACUUM
> worker process will prevent "unnecessary" page splits caused by
> version churn, simply because VACUUM will start to clean each index
> sooner than it would compared to serial processing (except for the
> "lucky" first index). There is no "lucky" first index that gets
> preferential treatment -- presumably VACUUM will start processing each
> index at the same time with this patch, making each index equally
> "lucky".
>
> I think that there may even be a *complementary* effect with parallel
> VACUUM, though I haven't tested that theory. Deduplication "buys time"
> for VACUUM to run, while at the same time VACUUM takes less time to
> show up and prevent "unnecessary" page splits. My guess is that these
> two seemingly unrelated patches may actually address this "unnecessary
> page split" problem from two completely different angles, with an
> overall effect that is greater than the sum of its parts.
>

Good analysis and I agree that the parallel vacuum patch can help in
such cases.  However, as of now, it only works via Vacuum command, so
some user intervention is required to realize the benefit.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Fri, Jan 17, 2020 at 4:35 PM Mahendra Singh Thalor
<mahi6run@gmail.com> wrote:
>
> Below are some review comments for v50 patch.
>
> 1.
> +LVShared
> +LVSharedIndStats
> +LVParallelState
>  LWLock
>
> I think, LVParallelState should come before LVSharedIndStats.
>
> 2.
> +    /*
> +     * It is possible that parallel context is initialized with fewer workers
> +     * then the number of indexes that need a separate worker in the current
> +     * phase, so we need to consider it.  See compute_parallel_vacuum_workers.
> +     */
>
> This comment is confusing me. I think, "then" should be replaced with "than".
>

Pushed, after fixing these two comments.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Mon, 20 Jan 2020 at 12:39, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jan 17, 2020 at 4:35 PM Mahendra Singh Thalor
> <mahi6run@gmail.com> wrote:
> >
> > Below are some review comments for v50 patch.
> >
> > 1.
> > +LVShared
> > +LVSharedIndStats
> > +LVParallelState
> >  LWLock
> >
> > I think, LVParallelState should come before LVSharedIndStats.
> >
> > 2.
> > +    /*
> > +     * It is possible that parallel context is initialized with fewer workers
> > +     * then the number of indexes that need a separate worker in the current
> > +     * phase, so we need to consider it.  See compute_parallel_vacuum_workers.
> > +     */
> >
> > This comment is confusing me. I think, "then" should be replaced with "than".
> >
>
> Pushed, after fixing these two comments.

Thank you for committing!

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Andres Freund
Date:
Hi,

On 2020-01-20 09:09:35 +0530, Amit Kapila wrote:
> Pushed, after fixing these two comments.

When attempting to vacuum a large table I just got:

postgres=# vacuum FREEZE ;
ERROR:  invalid memory alloc request size 1073741828

#0  palloc (size=1073741828) at /mnt/tools/src/postgresql/src/backend/utils/mmgr/mcxt.c:959
#1  0x000056452cc45cac in lazy_space_alloc (vacrelstats=0x56452e5ab0e8, vacrelstats=0x56452e5ab0e8,
relblocks=24686152)
    at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:2741
#2  lazy_scan_heap (aggressive=true, nindexes=1, Irel=0x56452e5ab1c8, vacrelstats=<optimized out>,
params=0x7ffdf8c00290,onerel=<optimized out>)
 
    at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:786
#3  heap_vacuum_rel (onerel=<optimized out>, params=0x7ffdf8c00290, bstrategy=<optimized out>)
    at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:472
#4  0x000056452cd8b42c in table_relation_vacuum (bstrategy=<optimized out>, params=0x7ffdf8c00290, rel=0x7fbcdff1e248)
    at /mnt/tools/src/postgresql/src/include/access/tableam.h:1450
#5  vacuum_rel (relid=16454, relation=<optimized out>, params=params@entry=0x7ffdf8c00290) at
/mnt/tools/src/postgresql/src/backend/commands/vacuum.c:1882

Looks to me that the calculation moved into compute_max_dead_tuples()
continues to use use an allocation ceiling
        maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
but the actual allocation now is

#define SizeOfLVDeadTuples(cnt) \
        add_size((offsetof(LVDeadTuples, itemptrs)), \
                 mul_size(sizeof(ItemPointerData), cnt))

i.e. the overhead of offsetof(LVDeadTuples, itemptrs) is not taken into
account.

Regards,

Andres



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Tue, Jan 21, 2020 at 11:30 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2020-01-20 09:09:35 +0530, Amit Kapila wrote:
> > Pushed, after fixing these two comments.
>
> When attempting to vacuum a large table I just got:
>
> postgres=# vacuum FREEZE ;
> ERROR:  invalid memory alloc request size 1073741828
>
> #0  palloc (size=1073741828) at /mnt/tools/src/postgresql/src/backend/utils/mmgr/mcxt.c:959
> #1  0x000056452cc45cac in lazy_space_alloc (vacrelstats=0x56452e5ab0e8, vacrelstats=0x56452e5ab0e8,
relblocks=24686152)
>     at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:2741
> #2  lazy_scan_heap (aggressive=true, nindexes=1, Irel=0x56452e5ab1c8, vacrelstats=<optimized out>,
params=0x7ffdf8c00290,onerel=<optimized out>)
 
>     at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:786
> #3  heap_vacuum_rel (onerel=<optimized out>, params=0x7ffdf8c00290, bstrategy=<optimized out>)
>     at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:472
> #4  0x000056452cd8b42c in table_relation_vacuum (bstrategy=<optimized out>, params=0x7ffdf8c00290,
rel=0x7fbcdff1e248)
>     at /mnt/tools/src/postgresql/src/include/access/tableam.h:1450
> #5  vacuum_rel (relid=16454, relation=<optimized out>, params=params@entry=0x7ffdf8c00290) at
/mnt/tools/src/postgresql/src/backend/commands/vacuum.c:1882
>
> Looks to me that the calculation moved into compute_max_dead_tuples()
> continues to use use an allocation ceiling
>                 maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
> but the actual allocation now is
>
> #define SizeOfLVDeadTuples(cnt) \
>                 add_size((offsetof(LVDeadTuples, itemptrs)), \
>                                  mul_size(sizeof(ItemPointerData), cnt))
>
> i.e. the overhead of offsetof(LVDeadTuples, itemptrs) is not taken into
> account.
>

Right, I think we need to take into account in both the places in
compute_max_dead_tuples():

maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
..
maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, 21 Jan 2020 at 15:35, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jan 21, 2020 at 11:30 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On 2020-01-20 09:09:35 +0530, Amit Kapila wrote:
> > > Pushed, after fixing these two comments.
> >
> > When attempting to vacuum a large table I just got:
> >
> > postgres=# vacuum FREEZE ;
> > ERROR:  invalid memory alloc request size 1073741828
> >
> > #0  palloc (size=1073741828) at /mnt/tools/src/postgresql/src/backend/utils/mmgr/mcxt.c:959
> > #1  0x000056452cc45cac in lazy_space_alloc (vacrelstats=0x56452e5ab0e8, vacrelstats=0x56452e5ab0e8,
relblocks=24686152)
> >     at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:2741
> > #2  lazy_scan_heap (aggressive=true, nindexes=1, Irel=0x56452e5ab1c8, vacrelstats=<optimized out>,
params=0x7ffdf8c00290,onerel=<optimized out>)
 
> >     at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:786
> > #3  heap_vacuum_rel (onerel=<optimized out>, params=0x7ffdf8c00290, bstrategy=<optimized out>)
> >     at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:472
> > #4  0x000056452cd8b42c in table_relation_vacuum (bstrategy=<optimized out>, params=0x7ffdf8c00290,
rel=0x7fbcdff1e248)
> >     at /mnt/tools/src/postgresql/src/include/access/tableam.h:1450
> > #5  vacuum_rel (relid=16454, relation=<optimized out>, params=params@entry=0x7ffdf8c00290) at
/mnt/tools/src/postgresql/src/backend/commands/vacuum.c:1882
> >
> > Looks to me that the calculation moved into compute_max_dead_tuples()
> > continues to use use an allocation ceiling
> >                 maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
> > but the actual allocation now is
> >
> > #define SizeOfLVDeadTuples(cnt) \
> >                 add_size((offsetof(LVDeadTuples, itemptrs)), \
> >                                  mul_size(sizeof(ItemPointerData), cnt))
> >
> > i.e. the overhead of offsetof(LVDeadTuples, itemptrs) is not taken into
> > account.
> >
>
> Right, I think we need to take into account in both the places in
> compute_max_dead_tuples():
>
> maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
> ..
> maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
>
>

Agreed. Attached patch should fix this issue.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Tue, Jan 21, 2020 at 12:11 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Tue, 21 Jan 2020 at 15:35, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Jan 21, 2020 at 11:30 AM Andres Freund <andres@anarazel.de> wrote:
> > >
> > > Hi,
> > >
> > > On 2020-01-20 09:09:35 +0530, Amit Kapila wrote:
> > > > Pushed, after fixing these two comments.
> > >
> > > When attempting to vacuum a large table I just got:
> > >
> > > postgres=# vacuum FREEZE ;
> > > ERROR:  invalid memory alloc request size 1073741828
> > >
> > > #0  palloc (size=1073741828) at /mnt/tools/src/postgresql/src/backend/utils/mmgr/mcxt.c:959
> > > #1  0x000056452cc45cac in lazy_space_alloc (vacrelstats=0x56452e5ab0e8, vacrelstats=0x56452e5ab0e8,
relblocks=24686152)
> > >     at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:2741
> > > #2  lazy_scan_heap (aggressive=true, nindexes=1, Irel=0x56452e5ab1c8, vacrelstats=<optimized out>,
params=0x7ffdf8c00290,onerel=<optimized out>)
 
> > >     at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:786
> > > #3  heap_vacuum_rel (onerel=<optimized out>, params=0x7ffdf8c00290, bstrategy=<optimized out>)
> > >     at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:472
> > > #4  0x000056452cd8b42c in table_relation_vacuum (bstrategy=<optimized out>, params=0x7ffdf8c00290,
rel=0x7fbcdff1e248)
> > >     at /mnt/tools/src/postgresql/src/include/access/tableam.h:1450
> > > #5  vacuum_rel (relid=16454, relation=<optimized out>, params=params@entry=0x7ffdf8c00290) at
/mnt/tools/src/postgresql/src/backend/commands/vacuum.c:1882
> > >
> > > Looks to me that the calculation moved into compute_max_dead_tuples()
> > > continues to use use an allocation ceiling
> > >                 maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
> > > but the actual allocation now is
> > >
> > > #define SizeOfLVDeadTuples(cnt) \
> > >                 add_size((offsetof(LVDeadTuples, itemptrs)), \
> > >                                  mul_size(sizeof(ItemPointerData), cnt))
> > >
> > > i.e. the overhead of offsetof(LVDeadTuples, itemptrs) is not taken into
> > > account.
> > >
> >
> > Right, I think we need to take into account in both the places in
> > compute_max_dead_tuples():
> >
> > maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
> > ..
> > maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
> >
> >
>
> Agreed. Attached patch should fix this issue.
>

if (useindex)
  {
- maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
+ maxtuples = ((vac_work_mem * 1024L) - SizeOfLVDeadTuplesHeader) /
sizeof(ItemPointerData);

SizeOfLVDeadTuplesHeader is not defined by patch.  Do you think it
makes sense to add a comment here about the calculation?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, 21 Jan 2020 at 16:13, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jan 21, 2020 at 12:11 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Tue, 21 Jan 2020 at 15:35, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Jan 21, 2020 at 11:30 AM Andres Freund <andres@anarazel.de> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On 2020-01-20 09:09:35 +0530, Amit Kapila wrote:
> > > > > Pushed, after fixing these two comments.
> > > >
> > > > When attempting to vacuum a large table I just got:
> > > >
> > > > postgres=# vacuum FREEZE ;
> > > > ERROR:  invalid memory alloc request size 1073741828
> > > >
> > > > #0  palloc (size=1073741828) at /mnt/tools/src/postgresql/src/backend/utils/mmgr/mcxt.c:959
> > > > #1  0x000056452cc45cac in lazy_space_alloc (vacrelstats=0x56452e5ab0e8, vacrelstats=0x56452e5ab0e8,
relblocks=24686152)
> > > >     at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:2741
> > > > #2  lazy_scan_heap (aggressive=true, nindexes=1, Irel=0x56452e5ab1c8, vacrelstats=<optimized out>,
params=0x7ffdf8c00290,onerel=<optimized out>)
 
> > > >     at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:786
> > > > #3  heap_vacuum_rel (onerel=<optimized out>, params=0x7ffdf8c00290, bstrategy=<optimized out>)
> > > >     at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:472
> > > > #4  0x000056452cd8b42c in table_relation_vacuum (bstrategy=<optimized out>, params=0x7ffdf8c00290,
rel=0x7fbcdff1e248)
> > > >     at /mnt/tools/src/postgresql/src/include/access/tableam.h:1450
> > > > #5  vacuum_rel (relid=16454, relation=<optimized out>, params=params@entry=0x7ffdf8c00290) at
/mnt/tools/src/postgresql/src/backend/commands/vacuum.c:1882
> > > >
> > > > Looks to me that the calculation moved into compute_max_dead_tuples()
> > > > continues to use use an allocation ceiling
> > > >                 maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
> > > > but the actual allocation now is
> > > >
> > > > #define SizeOfLVDeadTuples(cnt) \
> > > >                 add_size((offsetof(LVDeadTuples, itemptrs)), \
> > > >                                  mul_size(sizeof(ItemPointerData), cnt))
> > > >
> > > > i.e. the overhead of offsetof(LVDeadTuples, itemptrs) is not taken into
> > > > account.
> > > >
> > >
> > > Right, I think we need to take into account in both the places in
> > > compute_max_dead_tuples():
> > >
> > > maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
> > > ..
> > > maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
> > >
> > >
> >
> > Agreed. Attached patch should fix this issue.
> >
>
> if (useindex)
>   {
> - maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
> + maxtuples = ((vac_work_mem * 1024L) - SizeOfLVDeadTuplesHeader) /
> sizeof(ItemPointerData);
>
> SizeOfLVDeadTuplesHeader is not defined by patch.  Do you think it
> makes sense to add a comment here about the calculation?

Oops, it should be SizeOfLVDeadTuples. Attached updated version.

I defined two macros: SizeOfLVDeadTuples is the size of LVDeadTuples
struct and SizeOfDeadTuples is the size including LVDeadTuples struct
and dead tuples.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Tue, Jan 21, 2020 at 12:51 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Tue, 21 Jan 2020 at 16:13, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > SizeOfLVDeadTuplesHeader is not defined by patch.  Do you think it
> > makes sense to add a comment here about the calculation?
>
> Oops, it should be SizeOfLVDeadTuples. Attached updated version.
>
> I defined two macros: SizeOfLVDeadTuples is the size of LVDeadTuples
> struct and SizeOfDeadTuples is the size including LVDeadTuples struct
> and dead tuples.
>

I have reproduced the issue by defining MaxAllocSize as 10240000 and
then during debugging, skipped the check related to LAZY_ALLOC_TUPLES.
After patch, it fixes the problem for me.  I have slightly modified
your patch to define the macros on the lines of existing macros
TXID_SNAPSHOT_SIZE and TXID_SNAPSHOT_MAX_NXIP.  What do you think
about it?

Andres, see if you get a chance to run the test again with the
attached patch, otherwise, I will commit it tomorrow morning.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Dilip Kumar
Date:
On Tue, Jan 21, 2020 at 2:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jan 21, 2020 at 12:51 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Tue, 21 Jan 2020 at 16:13, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > SizeOfLVDeadTuplesHeader is not defined by patch.  Do you think it
> > > makes sense to add a comment here about the calculation?
> >
> > Oops, it should be SizeOfLVDeadTuples. Attached updated version.
> >
> > I defined two macros: SizeOfLVDeadTuples is the size of LVDeadTuples
> > struct and SizeOfDeadTuples is the size including LVDeadTuples struct
> > and dead tuples.
> >
>
> I have reproduced the issue by defining MaxAllocSize as 10240000 and
> then during debugging, skipped the check related to LAZY_ALLOC_TUPLES.
> After patch, it fixes the problem for me.  I have slightly modified
> your patch to define the macros on the lines of existing macros
> TXID_SNAPSHOT_SIZE and TXID_SNAPSHOT_MAX_NXIP.  What do you think
> about it?
>
> Andres, see if you get a chance to run the test again with the
> attached patch, otherwise, I will commit it tomorrow morning.
>
Patch looks fine to me except, we better use parentheses for the
variable passed in macro.

+#define MAXDEADTUPLES(max_size) \
+ ((max_size - offsetof(LVDeadTuples, itemptrs)) / sizeof(ItemPointerData))
change to -> (((max_size) - offsetof(LVDeadTuples, itemptrs)) /
sizeof(ItemPointerData))

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, 21 Jan 2020 at 18:16, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jan 21, 2020 at 12:51 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Tue, 21 Jan 2020 at 16:13, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > SizeOfLVDeadTuplesHeader is not defined by patch.  Do you think it
> > > makes sense to add a comment here about the calculation?
> >
> > Oops, it should be SizeOfLVDeadTuples. Attached updated version.
> >
> > I defined two macros: SizeOfLVDeadTuples is the size of LVDeadTuples
> > struct and SizeOfDeadTuples is the size including LVDeadTuples struct
> > and dead tuples.
> >
>
> I have reproduced the issue by defining MaxAllocSize as 10240000 and
> then during debugging, skipped the check related to LAZY_ALLOC_TUPLES.
> After patch, it fixes the problem for me.  I have slightly modified
> your patch to define the macros on the lines of existing macros
> TXID_SNAPSHOT_SIZE and TXID_SNAPSHOT_MAX_NXIP.  What do you think
> about it?

Thank you for updating the patch. Yeah MAXDEADTUPLES is better than
what I did in the previous version patch.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Wed, Jan 22, 2020 at 7:14 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Tue, 21 Jan 2020 at 18:16, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > I have reproduced the issue by defining MaxAllocSize as 10240000 and
> > then during debugging, skipped the check related to LAZY_ALLOC_TUPLES.
> > After patch, it fixes the problem for me.  I have slightly modified
> > your patch to define the macros on the lines of existing macros
> > TXID_SNAPSHOT_SIZE and TXID_SNAPSHOT_MAX_NXIP.  What do you think
> > about it?
>
> Thank you for updating the patch. Yeah MAXDEADTUPLES is better than
> what I did in the previous version patch.
>

Pushed after making the change suggested by Dilip.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Wed, Jan 22, 2020 at 7:14 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> Thank you for updating the patch. Yeah MAXDEADTUPLES is better than
> what I did in the previous version patch.
>

Would you like to resubmit your vacuumdb utility patch for this
enhancement?  I see some old version of it and it seems to me that you
need to update that patch.

+ if (optarg != NULL)
+ {
+ parallel_workers = atoi(optarg);
+ if (parallel_workers <= 0)
+ {
+ pg_log_error("number of parallel workers must be at least 1");
+ exit(1);
+ }
+ }

This will no longer be true.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Wed, 22 Jan 2020 at 11:23, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jan 22, 2020 at 7:14 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > Thank you for updating the patch. Yeah MAXDEADTUPLES is better than
> > what I did in the previous version patch.
> >
>
> Would you like to resubmit your vacuumdb utility patch for this
> enhancement?  I see some old version of it and it seems to me that you
> need to update that patch.
>
> + if (optarg != NULL)
> + {
> + parallel_workers = atoi(optarg);
> + if (parallel_workers <= 0)
> + {
> + pg_log_error("number of parallel workers must be at least 1");
> + exit(1);
> + }
> + }
>
> This will no longer be true.

Attached the updated version patch.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh Thalor
Date:
On Wed, 22 Jan 2020 at 12:48, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Wed, 22 Jan 2020 at 11:23, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Jan 22, 2020 at 7:14 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > Thank you for updating the patch. Yeah MAXDEADTUPLES is better than
> > > what I did in the previous version patch.
> > >
> >
> > Would you like to resubmit your vacuumdb utility patch for this
> > enhancement?  I see some old version of it and it seems to me that you
> > need to update that patch.
> >
> > + if (optarg != NULL)
> > + {
> > + parallel_workers = atoi(optarg);
> > + if (parallel_workers <= 0)
> > + {
> > + pg_log_error("number of parallel workers must be at least 1");
> > + exit(1);
> > + }
> > + }
> >
> > This will no longer be true.
>
> Attached the updated version patch.
>

Thanks Sawada-san for the re-based patch.

I reviewed and tested this patch.  Patch looks good to me.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh Thalor
Date:
On Thu, 23 Jan 2020 at 15:32, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
>
> On Wed, 22 Jan 2020 at 12:48, Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Wed, 22 Jan 2020 at 11:23, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Wed, Jan 22, 2020 at 7:14 AM Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > Thank you for updating the patch. Yeah MAXDEADTUPLES is better than
> > > > what I did in the previous version patch.
> > > >
> > >
> > > Would you like to resubmit your vacuumdb utility patch for this
> > > enhancement?  I see some old version of it and it seems to me that you
> > > need to update that patch.
> > >
> > > + if (optarg != NULL)
> > > + {
> > > + parallel_workers = atoi(optarg);
> > > + if (parallel_workers <= 0)
> > > + {
> > > + pg_log_error("number of parallel workers must be at least 1");
> > > + exit(1);
> > > + }
> > > + }
> > >
> > > This will no longer be true.
> >
> > Attached the updated version patch.
> >
>
> Thanks Sawada-san for the re-based patch.
>
> I reviewed and tested this patch.  Patch looks good to me.

As offline, suggested by Amit Kapila, I verified vacuumdb "-P" option
functionality with older versions(<13) and also I tested vacuumdb by
giving "-j" option with "-P". All are working as per expectation and I
didn't find any issue with these options.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Fri, Jan 24, 2020 at 4:58 PM Mahendra Singh Thalor
<mahi6run@gmail.com> wrote:
>
> On Thu, 23 Jan 2020 at 15:32, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> >
> > On Wed, 22 Jan 2020 at 12:48, Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > Attached the updated version patch.
> >
> > Thanks Sawada-san for the re-based patch.
> >
> > I reviewed and tested this patch.  Patch looks good to me.
>
> As offline, suggested by Amit Kapila, I verified vacuumdb "-P" option
> functionality with older versions(<13) and also I tested vacuumdb by
> giving "-j" option with "-P". All are working as per expectation and I
> didn't find any issue with these options.
>

I have made few modifications in the patch.

1. I think we should try to block the usage of 'full' and 'parallel'
option in the utility rather than allowing the server to return an
error.
2. It is better to handle 'P' option in getopt_long in the order of
its declaration in long_options array.
3. Added an Assert for server version while handling of parallel option.
4. Added a few sentences in the documentation.

What do you guys think of the attached?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh Thalor
Date:
On Sat, 25 Jan 2020 at 12:11, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jan 24, 2020 at 4:58 PM Mahendra Singh Thalor
> <mahi6run@gmail.com> wrote:
> >
> > On Thu, 23 Jan 2020 at 15:32, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> > >
> > > On Wed, 22 Jan 2020 at 12:48, Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > Attached the updated version patch.
> > >
> > > Thanks Sawada-san for the re-based patch.
> > >
> > > I reviewed and tested this patch.  Patch looks good to me.
> >
> > As offline, suggested by Amit Kapila, I verified vacuumdb "-P" option
> > functionality with older versions(<13) and also I tested vacuumdb by
> > giving "-j" option with "-P". All are working as per expectation and I
> > didn't find any issue with these options.
> >
>
> I have made few modifications in the patch.
>
> 1. I think we should try to block the usage of 'full' and 'parallel'
> option in the utility rather than allowing the server to return an
> error.
> 2. It is better to handle 'P' option in getopt_long in the order of
> its declaration in long_options array.
> 3. Added an Assert for server version while handling of parallel option.
> 4. Added a few sentences in the documentation.
>
> What do you guys think of the attached?
>

I took one more review round.  Below are some review comments:

1.
-P, --parallel=PARALLEL_DEGREE  do parallel vacuum

I think, "do parallel vacuum" should be modified. Without specifying -P, we are still doing parallel vacuum so we can use like "degree for parallel vacuum"

2. Error message inconsistent for FULL and parallel option:
Error for normal vacuum:
ERROR:  cannot specify both FULL and PARALLEL options

Error for vacuumdb:
error: cannot use the "parallel" option when performing full

I think, both the places, we should use 2nd error message as it is giving more clarity.

--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Tue, Jan 28, 2020 at 2:13 AM Mahendra Singh Thalor
<mahi6run@gmail.com> wrote:
>
> On Sat, 25 Jan 2020 at 12:11, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Jan 24, 2020 at 4:58 PM Mahendra Singh Thalor
> > <mahi6run@gmail.com> wrote:
> > >
> > > On Thu, 23 Jan 2020 at 15:32, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> > > >
> > > > On Wed, 22 Jan 2020 at 12:48, Masahiko Sawada
> > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > > >
> > > > > Attached the updated version patch.
> > > >
> > > > Thanks Sawada-san for the re-based patch.
> > > >
> > > > I reviewed and tested this patch.  Patch looks good to me.
> > >
> > > As offline, suggested by Amit Kapila, I verified vacuumdb "-P" option
> > > functionality with older versions(<13) and also I tested vacuumdb by
> > > giving "-j" option with "-P". All are working as per expectation and I
> > > didn't find any issue with these options.
> > >
> >
> > I have made few modifications in the patch.
> >
> > 1. I think we should try to block the usage of 'full' and 'parallel'
> > option in the utility rather than allowing the server to return an
> > error.
> > 2. It is better to handle 'P' option in getopt_long in the order of
> > its declaration in long_options array.
> > 3. Added an Assert for server version while handling of parallel option.
> > 4. Added a few sentences in the documentation.
> >
> > What do you guys think of the attached?
> >
>
> I took one more review round.  Below are some review comments:
>
> 1.
> -P, --parallel=PARALLEL_DEGREE  do parallel vacuum
>
> I think, "do parallel vacuum" should be modified. Without specifying -P, we are still doing parallel vacuum so we can
uselike "degree for parallel vacuum"
 
>

I am not sure if 'degree' makes it very clear.  How about "use this
many background workers for vacuum, if available"?

> 2. Error message inconsistent for FULL and parallel option:
> Error for normal vacuum:
> ERROR:  cannot specify both FULL and PARALLEL options
>
> Error for vacuumdb:
> error: cannot use the "parallel" option when performing full
>
> I think, both the places, we should use 2nd error message as it is giving more clarity.
>

Which message are you advocating here "cannot use the "parallel"
option when performing full" or "cannot specify both FULL and PARALLEL
options"?  The message used in this patch is mainly because of
consistency with nearby messages in the vacuumdb utility. If you are
advocating to change "cannot specify both FULL and PARALLEL options"
to match what we are using in this patch, then it is better to do that
separately and maybe ask for more opinions.  I think I understand your
desire to use the same message at both places, but it seems to me the
messages used in both the places are to maintain consistency with the
nearby code or the message used for a similar purpose.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Sat, 25 Jan 2020 at 15:41, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jan 24, 2020 at 4:58 PM Mahendra Singh Thalor
> <mahi6run@gmail.com> wrote:
> >
> > On Thu, 23 Jan 2020 at 15:32, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> > >
> > > On Wed, 22 Jan 2020 at 12:48, Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > Attached the updated version patch.
> > >
> > > Thanks Sawada-san for the re-based patch.
> > >
> > > I reviewed and tested this patch.  Patch looks good to me.
> >
> > As offline, suggested by Amit Kapila, I verified vacuumdb "-P" option
> > functionality with older versions(<13) and also I tested vacuumdb by
> > giving "-j" option with "-P". All are working as per expectation and I
> > didn't find any issue with these options.
> >
>
> I have made few modifications in the patch.
>
> 1. I think we should try to block the usage of 'full' and 'parallel'
> option in the utility rather than allowing the server to return an
> error.
> 2. It is better to handle 'P' option in getopt_long in the order of
> its declaration in long_options array.
> 3. Added an Assert for server version while handling of parallel option.
> 4. Added a few sentences in the documentation.
>
> What do you guys think of the attached?

Your changes look good me.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh Thalor
Date:
On Tue, 28 Jan 2020 at 08:14, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jan 28, 2020 at 2:13 AM Mahendra Singh Thalor
> <mahi6run@gmail.com> wrote:
> >
> > On Sat, 25 Jan 2020 at 12:11, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Fri, Jan 24, 2020 at 4:58 PM Mahendra Singh Thalor
> > > <mahi6run@gmail.com> wrote:
> > > >
> > > > On Thu, 23 Jan 2020 at 15:32, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> > > > >
> > > > > On Wed, 22 Jan 2020 at 12:48, Masahiko Sawada
> > > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > > > >
> > > > > > Attached the updated version patch.
> > > > >
> > > > > Thanks Sawada-san for the re-based patch.
> > > > >
> > > > > I reviewed and tested this patch.  Patch looks good to me.
> > > >
> > > > As offline, suggested by Amit Kapila, I verified vacuumdb "-P" option
> > > > functionality with older versions(<13) and also I tested vacuumdb by
> > > > giving "-j" option with "-P". All are working as per expectation and I
> > > > didn't find any issue with these options.
> > > >
> > >
> > > I have made few modifications in the patch.
> > >
> > > 1. I think we should try to block the usage of 'full' and 'parallel'
> > > option in the utility rather than allowing the server to return an
> > > error.
> > > 2. It is better to handle 'P' option in getopt_long in the order of
> > > its declaration in long_options array.
> > > 3. Added an Assert for server version while handling of parallel option.
> > > 4. Added a few sentences in the documentation.
> > >
> > > What do you guys think of the attached?
> > >
> >
> > I took one more review round.  Below are some review comments:
> >
> > 1.
> > -P, --parallel=PARALLEL_DEGREE  do parallel vacuum
> >
> > I think, "do parallel vacuum" should be modified. Without specifying -P, we are still doing parallel vacuum so we can use like "degree for parallel vacuum"
> >
>
> I am not sure if 'degree' makes it very clear.  How about "use this
> many background workers for vacuum, if available"?

If background workers are many, then automatically, we are using them(by default parallel vacuum). This option is to put limit on background workers(limit for vacuum workers) to be used by vacuum process.  So I think, we can use "max parallel vacuum workers (by default, based on no. of indexes)" or "control parallel vacuum workers"

>
> > 2. Error message inconsistent for FULL and parallel option:
> > Error for normal vacuum:
> > ERROR:  cannot specify both FULL and PARALLEL options
> >
> > Error for vacuumdb:
> > error: cannot use the "parallel" option when performing full
> >
> > I think, both the places, we should use 2nd error message as it is giving more clarity.
> >
>
> Which message are you advocating here "cannot use the "parallel"
> option when performing full" or "cannot specify both FULL and PARALLEL
> options"?  The message used in this patch is mainly because of

I mean that "cannot use the "parallel" option when performing full" should be used in both the places.

> consistency with nearby messages in the vacuumdb utility. If you are
> advocating to change "cannot specify both FULL and PARALLEL options"
> to match what we are using in this patch, then it is better to do that
> separately and maybe ask for more opinions.  I think I understand your
> desire to use the same message at both places, but it seems to me the
> messages used in both the places are to maintain consistency with the
> nearby code or the message used for a similar purpose.

Okay. I am agree with your points. Let's keep as it is.

--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Tue, Jan 28, 2020 at 12:04 PM Mahendra Singh Thalor
<mahi6run@gmail.com> wrote:
>
> On Tue, 28 Jan 2020 at 08:14, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Jan 28, 2020 at 2:13 AM Mahendra Singh Thalor
> > <mahi6run@gmail.com> wrote:
> > >
> > > On Sat, 25 Jan 2020 at 12:11, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Fri, Jan 24, 2020 at 4:58 PM Mahendra Singh Thalor
> > > > <mahi6run@gmail.com> wrote:
> > > > >
> > > > > On Thu, 23 Jan 2020 at 15:32, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> > > > > >
> > > > > > On Wed, 22 Jan 2020 at 12:48, Masahiko Sawada
> > > > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > > > > >
> > > > > > > Attached the updated version patch.
> > > > > >
> > > > > > Thanks Sawada-san for the re-based patch.
> > > > > >
> > > > > > I reviewed and tested this patch.  Patch looks good to me.
> > > > >
> > > > > As offline, suggested by Amit Kapila, I verified vacuumdb "-P" option
> > > > > functionality with older versions(<13) and also I tested vacuumdb by
> > > > > giving "-j" option with "-P". All are working as per expectation and I
> > > > > didn't find any issue with these options.
> > > > >
> > > >
> > > > I have made few modifications in the patch.
> > > >
> > > > 1. I think we should try to block the usage of 'full' and 'parallel'
> > > > option in the utility rather than allowing the server to return an
> > > > error.
> > > > 2. It is better to handle 'P' option in getopt_long in the order of
> > > > its declaration in long_options array.
> > > > 3. Added an Assert for server version while handling of parallel option.
> > > > 4. Added a few sentences in the documentation.
> > > >
> > > > What do you guys think of the attached?
> > > >
> > >
> > > I took one more review round.  Below are some review comments:
> > >
> > > 1.
> > > -P, --parallel=PARALLEL_DEGREE  do parallel vacuum
> > >
> > > I think, "do parallel vacuum" should be modified. Without specifying -P, we are still doing parallel vacuum so we
canuse like "degree for parallel vacuum"
 
> > >
> >
> > I am not sure if 'degree' makes it very clear.  How about "use this
> > many background workers for vacuum, if available"?
>
> If background workers are many, then automatically, we are using them(by default parallel vacuum). This option is to
putlimit on background workers(limit for vacuum workers) to be used by vacuum process.
 
>

I don't think that the option is just to specify the max limit because
that is generally controlled by guc parameters.  This option allows
users to specify the number of workers for the cases where he has more
knowledge about the size/type of indexes.  In some cases, the user
might be able to make a better decision and that was the reason we
have added this option in the first place.

> So I think, we can use "max parallel vacuum workers (by default, based on no. of indexes)" or "control parallel
vacuumworkers"
 
>

Hmm, I feel what I suggested is better because of the above explanation.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Mahendra Singh Thalor
Date:
On Tue, 28 Jan 2020 at 12:32, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jan 28, 2020 at 12:04 PM Mahendra Singh Thalor
> <mahi6run@gmail.com> wrote:
> >
> > On Tue, 28 Jan 2020 at 08:14, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Jan 28, 2020 at 2:13 AM Mahendra Singh Thalor
> > > <mahi6run@gmail.com> wrote:
> > > >
> > > > On Sat, 25 Jan 2020 at 12:11, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > On Fri, Jan 24, 2020 at 4:58 PM Mahendra Singh Thalor
> > > > > <mahi6run@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, 23 Jan 2020 at 15:32, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> > > > > > >
> > > > > > > On Wed, 22 Jan 2020 at 12:48, Masahiko Sawada
> > > > > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > > > > > >
> > > > > > > > Attached the updated version patch.
> > > > > > >
> > > > > > > Thanks Sawada-san for the re-based patch.
> > > > > > >
> > > > > > > I reviewed and tested this patch.  Patch looks good to me.
> > > > > >
> > > > > > As offline, suggested by Amit Kapila, I verified vacuumdb "-P" option
> > > > > > functionality with older versions(<13) and also I tested vacuumdb by
> > > > > > giving "-j" option with "-P". All are working as per expectation and I
> > > > > > didn't find any issue with these options.
> > > > > >
> > > > >
> > > > > I have made few modifications in the patch.
> > > > >
> > > > > 1. I think we should try to block the usage of 'full' and 'parallel'
> > > > > option in the utility rather than allowing the server to return an
> > > > > error.
> > > > > 2. It is better to handle 'P' option in getopt_long in the order of
> > > > > its declaration in long_options array.
> > > > > 3. Added an Assert for server version while handling of parallel option.
> > > > > 4. Added a few sentences in the documentation.
> > > > >
> > > > > What do you guys think of the attached?
> > > > >
> > > >
> > > > I took one more review round.  Below are some review comments:
> > > >
> > > > 1.
> > > > -P, --parallel=PARALLEL_DEGREE  do parallel vacuum
> > > >
> > > > I think, "do parallel vacuum" should be modified. Without specifying -P, we are still doing parallel vacuum so
wecan use like "degree for parallel vacuum"
 
> > > >
> > >
> > > I am not sure if 'degree' makes it very clear.  How about "use this
> > > many background workers for vacuum, if available"?
> >
> > If background workers are many, then automatically, we are using them(by default parallel vacuum). This option is
toput limit on background workers(limit for vacuum workers) to be used by vacuum process.
 
> >
>
> I don't think that the option is just to specify the max limit because
> that is generally controlled by guc parameters.  This option allows
> users to specify the number of workers for the cases where he has more
> knowledge about the size/type of indexes.  In some cases, the user
> might be able to make a better decision and that was the reason we
> have added this option in the first place.
>
> > So I think, we can use "max parallel vacuum workers (by default, based on no. of indexes)" or "control parallel
vacuumworkers"
 
> >
>
> Hmm, I feel what I suggested is better because of the above explanation.

Agreed.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Tue, Jan 28, 2020 at 12:53 PM Mahendra Singh Thalor
<mahi6run@gmail.com> wrote:
>
> > > > > 1.
> > > > > -P, --parallel=PARALLEL_DEGREE  do parallel vacuum
> > > > >
> > > > > I think, "do parallel vacuum" should be modified. Without specifying -P, we are still doing parallel vacuum
sowe can use like "degree for parallel vacuum"
 
> > > > >
> > > >
> > > > I am not sure if 'degree' makes it very clear.  How about "use this
> > > > many background workers for vacuum, if available"?
> > >
> > > If background workers are many, then automatically, we are using them(by default parallel vacuum). This option is
toput limit on background workers(limit for vacuum workers) to be used by vacuum process.
 
> > >
> >
> > I don't think that the option is just to specify the max limit because
> > that is generally controlled by guc parameters.  This option allows
> > users to specify the number of workers for the cases where he has more
> > knowledge about the size/type of indexes.  In some cases, the user
> > might be able to make a better decision and that was the reason we
> > have added this option in the first place.
> >
> > > So I think, we can use "max parallel vacuum workers (by default, based on no. of indexes)" or "control parallel
vacuumworkers"
 
> > >
> >
> > Hmm, I feel what I suggested is better because of the above explanation.
>
> Agreed.
>

Okay, thanks for the review.  Attached is an updated patch. I have
additionally run pgindent.  I am planning to commit the attached
tomorrow unless I see more comments.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Tue, Jan 28, 2020 at 8:56 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Sat, 25 Jan 2020 at 15:41, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > I have made few modifications in the patch.
> >
> > 1. I think we should try to block the usage of 'full' and 'parallel'
> > option in the utility rather than allowing the server to return an
> > error.
> > 2. It is better to handle 'P' option in getopt_long in the order of
> > its declaration in long_options array.
> > 3. Added an Assert for server version while handling of parallel option.
> > 4. Added a few sentences in the documentation.
> >
> > What do you guys think of the attached?
>
> Your changes look good me.
>

Thanks for the review.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

From
Masahiko Sawada
Date:
On Tue, 28 Jan 2020 at 18:47, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jan 28, 2020 at 12:53 PM Mahendra Singh Thalor
> <mahi6run@gmail.com> wrote:
> >
> > > > > > 1.
> > > > > > -P, --parallel=PARALLEL_DEGREE  do parallel vacuum
> > > > > >
> > > > > > I think, "do parallel vacuum" should be modified. Without specifying -P, we are still doing parallel vacuum
sowe can use like "degree for parallel vacuum"
 
> > > > > >
> > > > >
> > > > > I am not sure if 'degree' makes it very clear.  How about "use this
> > > > > many background workers for vacuum, if available"?
> > > >
> > > > If background workers are many, then automatically, we are using them(by default parallel vacuum). This option
isto put limit on background workers(limit for vacuum workers) to be used by vacuum process.
 
> > > >
> > >
> > > I don't think that the option is just to specify the max limit because
> > > that is generally controlled by guc parameters.  This option allows
> > > users to specify the number of workers for the cases where he has more
> > > knowledge about the size/type of indexes.  In some cases, the user
> > > might be able to make a better decision and that was the reason we
> > > have added this option in the first place.
> > >
> > > > So I think, we can use "max parallel vacuum workers (by default, based on no. of indexes)" or "control parallel
vacuumworkers"
 
> > > >
> > >
> > > Hmm, I feel what I suggested is better because of the above explanation.
> >
> > Agreed.
> >
>
> Okay, thanks for the review.  Attached is an updated patch. I have
> additionally run pgindent.  I am planning to commit the attached
> tomorrow unless I see more comments.

Thank you for committing it!

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

From
Amit Kapila
Date:
On Wed, Jan 29, 2020 at 7:20 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> >
> > Okay, thanks for the review.  Attached is an updated patch. I have
> > additionally run pgindent.  I am planning to commit the attached
> > tomorrow unless I see more comments.
>
> Thank you for committing it!
>

I have marked this patch as committed in CF.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com