Thread: cost based vacuum (parallel)

cost based vacuum (parallel)

From
Amit Kapila
Date:
For parallel vacuum [1], we were discussing what is the best way to
divide the cost among parallel workers but we didn't get many inputs
apart from people who are very actively involved in patch development.
I feel that we need some more inputs before we finalize anything, so
starting a new thread.

The initial version of the patch has a very rudimentary way of doing
it which means each parallel vacuum worker operates independently
w.r.t vacuum delay and cost.  This will lead to more I/O in the system
than the user has intended to do.  Assume that the overall I/O allowed
for vacuum operation is X after which it will sleep for some time,
reset the balance and continue.  In the patch, each worker will be
allowed to perform X before which it can sleep and also there is no
coordination for the same with master backend which would have done
some I/O for the heap.  So, in the worst-case scenario, there can be n
times more I/O where n is the number of workers doing the parallel
operation.  This is somewhat similar to a memory usage problem with a
parallel query where each worker is allowed to use up to work_mem of
memory.  We can say that the users using parallel operation can expect
more system resources to be used as they want to get the operation
done faster, so we are fine with this.  However, I am not sure if that
is the right thing, so we should try to come up with some solution for
it and if the solution is too complex, then probably we can think of
documenting such behavior.

The two approaches to solve this problem being discussed in that
thread [1] are as follows:
(a) Allow the parallel workers and master backend to have a shared
view of vacuum cost related parameters (mainly VacuumCostBalance) and
allow each worker to update it and then based on that decide whether
it needs to sleep.  Sawada-San has done the POC for this approach.
See v32-0004-PoC-shared-vacuum-cost-balance in email [2].  One
drawback of this approach could be that we allow the worker to sleep
even though the I/O has been performed by some other worker.

(b) The other idea could be that we split the I/O among workers
something similar to what we do for auto vacuum workers (see
autovac_balance_cost).  The basic idea would be that before launching
workers, we need to compute the remaining I/O (heap operation would
have used something) after which we need to sleep and split it equally
across workers.  Here, we are primarily thinking of dividing
VacuumCostBalance and VacuumCostLimit parameters.  Once the workers
are finished, they need to let master backend know how much I/O they
have consumed and then master backend can add it to it's current I/O
consumed.  I think we also need to rebalance the cost of remaining
workers once some of the worker's exit.  Dilip has prepared a POC
patch for this, see 0002-POC-divide-vacuum-cost-limit in email [3].

I think approach-2 is better in throttling the system as it doesn't
have the drawback of the first approach, but it might be a bit tricky
to implement.

As of now, the POC for both the approaches has been developed and we
see similar results for both approaches, but we have tested simpler
cases where each worker has similar amount of I/O to perform.

Thoughts?


[1] - https://commitfest.postgresql.org/25/1774/
[2] - https://www.postgresql.org/message-id/CAD21AoAqT17QwKJ_sWOqRxNvg66wMw1oZZzf9Rt-E-zD%2BXOh_Q%40mail.gmail.com
[3] - https://www.postgresql.org/message-id/CAFiTN-thU-z8f04jO7xGMu5yUUpTpsBTvBrFW6EhRf-jGvEz%3Dg%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: cost based vacuum (parallel)

From
Darafei "Komяpa" Praliaskouski
Date:

This is somewhat similar to a memory usage problem with a
parallel query where each worker is allowed to use up to work_mem of
memory.  We can say that the users using parallel operation can expect
more system resources to be used as they want to get the operation
done faster, so we are fine with this.  However, I am not sure if that
is the right thing, so we should try to come up with some solution for
it and if the solution is too complex, then probably we can think of
documenting such behavior.

In cloud environments (Amazon + gp2) there's a budget on input/output operations. If you cross it for long time, everything starts looking like you work with a floppy disk.

For the ease of configuration, I would need a "max_vacuum_disk_iops" that would limit number of input-output operations by all of the vacuums in the system. If I set it to less than value of budget refill, I can be sure than that no vacuum runs too fast to impact any sibling query. 

There's also value in non-throttled VACUUM for smaller tables. On gp2 such things will be consumed out of surge budget, and its size is known to sysadmin. Let's call it "max_vacuum_disk_surge_iops" - if a relation has less blocks than this value and it's a blocking in any way situation (antiwraparound, interactive console, ...) - go on and run without throttling.

For how to balance the cost: if we know a number of vacuum processes that were running in the previous second, we can just divide a slot for this iteration by that previous number. 

To correct for overshots, we can subtract the previous second's overshot from next one's. That would also allow to account for surge budget usage and let it refill, pausing all autovacuum after a manual one for some time.

Precision of accounting limiting count of operations more than once a second isn't beneficial for this use case. 

Please don't forget that processing one page can become several iops (read, write, wal).

Does this make sense? :)

 

Re: cost based vacuum (parallel)

From
Masahiko Sawada
Date:
On Mon, Nov 4, 2019 at 3:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> I think approach-2 is better in throttling the system as it doesn't
> have the drawback of the first approach, but it might be a bit tricky
> to implement.

I might be missing something but I think that there could be the
drawback of the approach-1 even on approach-2 depending on index pages
loaded on the shared buffer and the vacuum delay setting. Is it right?

Regards,

---
Masahiko Sawada      http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: cost based vacuum (parallel)

From
Amit Kapila
Date:
On Mon, Nov 4, 2019 at 1:51 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Nov 4, 2019 at 3:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > I think approach-2 is better in throttling the system as it doesn't
> > have the drawback of the first approach, but it might be a bit tricky
> > to implement.
>
> I might be missing something but I think that there could be the
> drawback of the approach-1 even on approach-2 depending on index pages
> loaded on the shared buffer and the vacuum delay setting.
>

Can you be a bit more specific about this?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: cost based vacuum (parallel)

From
Amit Kapila
Date:
On Mon, Nov 4, 2019 at 1:03 PM Darafei "Komяpa" Praliaskouski
<me@komzpa.net> wrote:
>>
>>
>> This is somewhat similar to a memory usage problem with a
>> parallel query where each worker is allowed to use up to work_mem of
>> memory.  We can say that the users using parallel operation can expect
>> more system resources to be used as they want to get the operation
>> done faster, so we are fine with this.  However, I am not sure if that
>> is the right thing, so we should try to come up with some solution for
>> it and if the solution is too complex, then probably we can think of
>> documenting such behavior.
>
>
> In cloud environments (Amazon + gp2) there's a budget on input/output operations. If you cross it for long time,
everythingstarts looking like you work with a floppy disk. 
>
> For the ease of configuration, I would need a "max_vacuum_disk_iops" that would limit number of input-output
operationsby all of the vacuums in the system. If I set it to less than value of budget refill, I can be sure than that
novacuum runs too fast to impact any sibling query. 
>
> There's also value in non-throttled VACUUM for smaller tables. On gp2 such things will be consumed out of surge
budget,and its size is known to sysadmin. Let's call it "max_vacuum_disk_surge_iops" - if a relation has less blocks
thanthis value and it's a blocking in any way situation (antiwraparound, interactive console, ...) - go on and run
withoutthrottling. 
>

I think the need for these things can be addressed by current
cost-based-vacuum parameters. See docs [1].  For example, if you set
vacuum_cost_delay as zero, it will allow the operation to be performed
without throttling.

> For how to balance the cost: if we know a number of vacuum processes that were running in the previous second, we can
justdivide a slot for this iteration by that previous number. 
>
> To correct for overshots, we can subtract the previous second's overshot from next one's. That would also allow to
accountfor surge budget usage and let it refill, pausing all autovacuum after a manual one for some time. 
>
> Precision of accounting limiting count of operations more than once a second isn't beneficial for this use case.
>

I think it is better if we find a way to rebalance the cost on some
worker exit rather than every second as anyway it won't change unless
any worker exits.

[1] - https://www.postgresql.org/docs/devel/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-VACUUM-COST

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: cost based vacuum (parallel)

From
Masahiko Sawada
Date:
On Mon, 4 Nov 2019 at 19:26, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Nov 4, 2019 at 1:51 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Nov 4, 2019 at 3:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > I think approach-2 is better in throttling the system as it doesn't
> > > have the drawback of the first approach, but it might be a bit tricky
> > > to implement.
> >
> > I might be missing something but I think that there could be the
> > drawback of the approach-1 even on approach-2 depending on index pages
> > loaded on the shared buffer and the vacuum delay setting.
> >
>
> Can you be a bit more specific about this?

Suppose there are two indexes: one index is loaded at all while
another index isn't. One vacuum worker who processes the former index
hits all pages on the shared buffer but another worker who processes
the latter index read all pages from either OS page cache or disk.
Even if both the cost limit and the cost balance are split evenly
among workers because the cost of page hits and page misses are
different it's possible that one vacuum worker sleeps while other
workers doing I/O.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: cost based vacuum (parallel)

From
Jeff Janes
Date:
On Mon, Nov 4, 2019 at 1:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
For parallel vacuum [1], we were discussing what is the best way to
divide the cost among parallel workers but we didn't get many inputs
apart from people who are very actively involved in patch development.
I feel that we need some more inputs before we finalize anything, so
starting a new thread.

Maybe a I just don't have experience in the type of system that parallel vacuum is needed for, but if there is any meaningful IO throttling which is active, then what is the point of doing the vacuum in parallel in the first place?

Cheers,

Jeff

Re: cost based vacuum (parallel)

From
Andres Freund
Date:
Hi,

On 2019-11-04 12:24:35 +0530, Amit Kapila wrote:
> For parallel vacuum [1], we were discussing what is the best way to
> divide the cost among parallel workers but we didn't get many inputs
> apart from people who are very actively involved in patch development.
> I feel that we need some more inputs before we finalize anything, so
> starting a new thread.
> 
> The initial version of the patch has a very rudimentary way of doing
> it which means each parallel vacuum worker operates independently
> w.r.t vacuum delay and cost.

Yea, that seems not ok for cases where vacuum delay is active.

There's also the question of when/why it is beneficial to use
parallelism when you're going to encounter IO limits in all likelihood.


> This will lead to more I/O in the system
> than the user has intended to do.  Assume that the overall I/O allowed
> for vacuum operation is X after which it will sleep for some time,
> reset the balance and continue.  In the patch, each worker will be
> allowed to perform X before which it can sleep and also there is no
> coordination for the same with master backend which would have done
> some I/O for the heap.  So, in the worst-case scenario, there can be n
> times more I/O where n is the number of workers doing the parallel
> operation.  This is somewhat similar to a memory usage problem with a
> parallel query where each worker is allowed to use up to work_mem of
> memory.  We can say that the users using parallel operation can expect
> more system resources to be used as they want to get the operation
> done faster, so we are fine with this.  However, I am not sure if that
> is the right thing, so we should try to come up with some solution for
> it and if the solution is too complex, then probably we can think of
> documenting such behavior.

I mean for parallel query the problem wasn't really introduced in
parallel query, it existed before - and does still - for non-parallel
queries. And there's a complex underlying planning issue. I don't think
this is a good excuse for VACUUM, where none of the complex "number of
paths considered" issues etc apply.


> The two approaches to solve this problem being discussed in that
> thread [1] are as follows:
> (a) Allow the parallel workers and master backend to have a shared
> view of vacuum cost related parameters (mainly VacuumCostBalance) and
> allow each worker to update it and then based on that decide whether
> it needs to sleep.  Sawada-San has done the POC for this approach.
> See v32-0004-PoC-shared-vacuum-cost-balance in email [2].  One
> drawback of this approach could be that we allow the worker to sleep
> even though the I/O has been performed by some other worker.

I don't understand this drawback.


> (b) The other idea could be that we split the I/O among workers
> something similar to what we do for auto vacuum workers (see
> autovac_balance_cost).  The basic idea would be that before launching
> workers, we need to compute the remaining I/O (heap operation would
> have used something) after which we need to sleep and split it equally
> across workers.  Here, we are primarily thinking of dividing
> VacuumCostBalance and VacuumCostLimit parameters.  Once the workers
> are finished, they need to let master backend know how much I/O they
> have consumed and then master backend can add it to it's current I/O
> consumed.  I think we also need to rebalance the cost of remaining
> workers once some of the worker's exit.  Dilip has prepared a POC
> patch for this, see 0002-POC-divide-vacuum-cost-limit in email [3].

(b) doesn't strike me as advantageous. It seems quite possible that you
end up with one worker that has a lot more IO than others, leading to
unnecessary sleeps, even though the actually available IO budget has not
been used up. Quite easy to see how that'd lead to parallel VACUUM
having a lower throughput than a single threaded one.

Greetings,

Andres Freund



Re: cost based vacuum (parallel)

From
Andres Freund
Date:
Hi,

On 2019-11-04 12:59:02 -0500, Jeff Janes wrote:
> On Mon, Nov 4, 2019 at 1:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > For parallel vacuum [1], we were discussing what is the best way to
> > divide the cost among parallel workers but we didn't get many inputs
> > apart from people who are very actively involved in patch development.
> > I feel that we need some more inputs before we finalize anything, so
> > starting a new thread.
> >
>
> Maybe a I just don't have experience in the type of system that parallel
> vacuum is needed for, but if there is any meaningful IO throttling which is
> active, then what is the point of doing the vacuum in parallel in the first
> place?

I am wondering the same - but to be fair, it's pretty easy to run into
cases where VACUUM is CPU bound. E.g. because most pages are in
shared_buffers, and compared to the size of the indexes number of tids
that need to be pruned is fairly small (also [1]). That means a lot of
pages need to be scanned, without a whole lot of IO going on. The
problem with that is just that the defaults for vacuum throttling will
also apply here, I've never seen anybody tune vacuum_cost_page_hit = 0,
vacuum_cost_page_dirty=0 or such (in contrast, the latter is the highest
cost currently).  Nor do we reduce the cost of vacuum_cost_page_dirty
for unlogged tables.

So while it doesn't seem unreasonable to want to use cost limiting to
protect against vacuum unexpectedly causing too much, especially read,
IO, I'm doubtful it has current practical relevance.

I'm wondering how much of the benefit of parallel vacuum really is just
to work around vacuum ringbuffers often massively hurting performance
(see e.g. [2]). Surely not all, but I'd be very unsurprised if it were a
large fraction.

Greetings,

Andres Freund

[1] I don't think the patch addresses this, IIUC it's only running index
    vacuums in parallel, but it's very easy to run into being CPU
    bottlenecked when vacuuming a busily updated table. heap_hot_prune
    can be really expensive, especially with longer update chains (I
    think it may have an O(n^2) worst case even).
[2] https://www.postgresql.org/message-id/20160406105716.fhk2eparljthpzp6%40alap3.anarazel.de



Re: cost based vacuum (parallel)

From
Stephen Frost
Date:
Greetings,

* Jeff Janes (jeff.janes@gmail.com) wrote:
> On Mon, Nov 4, 2019 at 1:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > For parallel vacuum [1], we were discussing what is the best way to
> > divide the cost among parallel workers but we didn't get many inputs
> > apart from people who are very actively involved in patch development.
> > I feel that we need some more inputs before we finalize anything, so
> > starting a new thread.
>
> Maybe a I just don't have experience in the type of system that parallel
> vacuum is needed for, but if there is any meaningful IO throttling which is
> active, then what is the point of doing the vacuum in parallel in the first
> place?

With parallelization across indexes, you could have a situation where
the individual indexes are on different tablespaces with independent
i/o, therefore the parallelization ends up giving you an increase in i/o
throughput, not just additional CPU time.

Thanks,

Stephen

Attachment

Re: cost based vacuum (parallel)

From
Andres Freund
Date:
Hi,

On 2019-11-04 14:06:19 -0500, Stephen Frost wrote:
> * Jeff Janes (jeff.janes@gmail.com) wrote:
> > On Mon, Nov 4, 2019 at 1:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > For parallel vacuum [1], we were discussing what is the best way to
> > > divide the cost among parallel workers but we didn't get many inputs
> > > apart from people who are very actively involved in patch development.
> > > I feel that we need some more inputs before we finalize anything, so
> > > starting a new thread.
> > 
> > Maybe a I just don't have experience in the type of system that parallel
> > vacuum is needed for, but if there is any meaningful IO throttling which is
> > active, then what is the point of doing the vacuum in parallel in the first
> > place?
> 
> With parallelization across indexes, you could have a situation where
> the individual indexes are on different tablespaces with independent
> i/o, therefore the parallelization ends up giving you an increase in i/o
> throughput, not just additional CPU time.

How's that related to IO throttling being active or not?

Greetings,

Andres Freund



Re: cost based vacuum (parallel)

From
Stephen Frost
Date:
Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> On 2019-11-04 14:06:19 -0500, Stephen Frost wrote:
> > * Jeff Janes (jeff.janes@gmail.com) wrote:
> > > On Mon, Nov 4, 2019 at 1:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > For parallel vacuum [1], we were discussing what is the best way to
> > > > divide the cost among parallel workers but we didn't get many inputs
> > > > apart from people who are very actively involved in patch development.
> > > > I feel that we need some more inputs before we finalize anything, so
> > > > starting a new thread.
> > >
> > > Maybe a I just don't have experience in the type of system that parallel
> > > vacuum is needed for, but if there is any meaningful IO throttling which is
> > > active, then what is the point of doing the vacuum in parallel in the first
> > > place?
> >
> > With parallelization across indexes, you could have a situation where
> > the individual indexes are on different tablespaces with independent
> > i/o, therefore the parallelization ends up giving you an increase in i/o
> > throughput, not just additional CPU time.
>
> How's that related to IO throttling being active or not?

You might find that you have to throttle the IO down when operating
exclusively against one IO channel, but if you have multiple IO channels
then the acceptable IO utilization could be higher as it would be
spread across the different IO channels.

In other words, the overall i/o allowance for a given operation might be
able to be higher if it's spread across multiple i/o channels, as it
wouldn't completely consume the i/o resources of any of them, whereas
with a higher allowance and a single i/o channel, there would likely be
an impact to other operations.

As for if this is really relevant only when it comes to parallel
operations is a bit of an interesting question- these considerations
might not require actual parallel operations as a single process might
be able to go through multiple indexes concurrently and still hit the
i/o limit that was set for it overall across the tablespaces.  I don't
know that it would actually be interesting or useful to spend the effort
to make that work though, so, from a practical perspective, it's
probably only interesting to think about this when talking about
parallel vacuum.

I've been wondering if the accounting system should consider the cost
per tablespace when there's multiple tablespaces involved, instead of
throttling the overall process without consideration for the
per-tablespace utilization.

Thanks,

Stephen

Attachment

Re: cost based vacuum (parallel)

From
Andres Freund
Date:
Hi,

On 2019-11-04 14:33:41 -0500, Stephen Frost wrote:
> * Andres Freund (andres@anarazel.de) wrote:
> > On 2019-11-04 14:06:19 -0500, Stephen Frost wrote:
> > > With parallelization across indexes, you could have a situation where
> > > the individual indexes are on different tablespaces with independent
> > > i/o, therefore the parallelization ends up giving you an increase in i/o
> > > throughput, not just additional CPU time.
> > 
> > How's that related to IO throttling being active or not?
> 
> You might find that you have to throttle the IO down when operating
> exclusively against one IO channel, but if you have multiple IO channels
> then the acceptable IO utilization could be higher as it would be 
> spread across the different IO channels.
> 
> In other words, the overall i/o allowance for a given operation might be
> able to be higher if it's spread across multiple i/o channels, as it
> wouldn't completely consume the i/o resources of any of them, whereas
> with a higher allowance and a single i/o channel, there would likely be
> an impact to other operations.
> 
> As for if this is really relevant only when it comes to parallel
> operations is a bit of an interesting question- these considerations
> might not require actual parallel operations as a single process might
> be able to go through multiple indexes concurrently and still hit the
> i/o limit that was set for it overall across the tablespaces.  I don't
> know that it would actually be interesting or useful to spend the effort
> to make that work though, so, from a practical perspective, it's
> probably only interesting to think about this when talking about
> parallel vacuum.

But you could just apply different budgets for different tablespaces?
That's quite doable independent of parallelism, as we don't have tables
or indexes spanning more than one tablespace.  True, you could then make
the processing of an individual vacuum faster by allowing to utilize
multiple tablespace budgets at the same time.


> I've been wondering if the accounting system should consider the cost
> per tablespace when there's multiple tablespaces involved, instead of
> throttling the overall process without consideration for the
> per-tablespace utilization.

This all seems like a feature proposal, or two, independent of the
patch/question at hand. I think there's a good argument to be had that
we should severely overhaul the current vacuum cost limiting - it's way
way too hard to understand the bandwidth that it's allowed to
consume. But unless one of the proposals makes that measurably harder or
easier, I think we don't gain anything by entangling an already complex
patchset with something new.


Greetings,

Andres Freund



Re: cost based vacuum (parallel)

From
Stephen Frost
Date:
Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> On 2019-11-04 14:33:41 -0500, Stephen Frost wrote:
> > * Andres Freund (andres@anarazel.de) wrote:
> > > On 2019-11-04 14:06:19 -0500, Stephen Frost wrote:
> > > > With parallelization across indexes, you could have a situation where
> > > > the individual indexes are on different tablespaces with independent
> > > > i/o, therefore the parallelization ends up giving you an increase in i/o
> > > > throughput, not just additional CPU time.
> > >
> > > How's that related to IO throttling being active or not?
> >
> > You might find that you have to throttle the IO down when operating
> > exclusively against one IO channel, but if you have multiple IO channels
> > then the acceptable IO utilization could be higher as it would be
> > spread across the different IO channels.
> >
> > In other words, the overall i/o allowance for a given operation might be
> > able to be higher if it's spread across multiple i/o channels, as it
> > wouldn't completely consume the i/o resources of any of them, whereas
> > with a higher allowance and a single i/o channel, there would likely be
> > an impact to other operations.
> >
> > As for if this is really relevant only when it comes to parallel
> > operations is a bit of an interesting question- these considerations
> > might not require actual parallel operations as a single process might
> > be able to go through multiple indexes concurrently and still hit the
> > i/o limit that was set for it overall across the tablespaces.  I don't
> > know that it would actually be interesting or useful to spend the effort
> > to make that work though, so, from a practical perspective, it's
> > probably only interesting to think about this when talking about
> > parallel vacuum.
>
> But you could just apply different budgets for different tablespaces?

Yes, that would be one approach to addressing this, though it would
change the existing meaning of those cost parameters.  I'm not sure if
we think that's an issue or not- if we only have this in the case of a
parallel vacuum then it's probably fine, I'm less sure if it'd be
alright to change that on an upgrade.

> That's quite doable independent of parallelism, as we don't have tables
> or indexes spanning more than one tablespace.  True, you could then make
> the processing of an individual vacuum faster by allowing to utilize
> multiple tablespace budgets at the same time.

Yes, it's possible to do independent of parallelism, but what I was
trying to get at above is that it might not be worth the effort.  When
it comes to parallel vacuum though, I'm not sure that you can just punt
on this question since you'll naturally end up spanning multiple
tablespaces concurrently, at least if the heap+indexes are spread across
multiple tablespaces and you're operating against more than one of those
relations at a time (which, I admit, I'm not 100% sure is actually
happening with this proposed patch set- if it isn't, then this isn't
really an issue, though that would be pretty unfortunate as then you
can't leverage multiple i/o channels concurrently and therefore Jeff's
question about why you'd be doing parallel vacuum with IO throttling is
a pretty good one).

Thanks,

Stephen

Attachment

Re: cost based vacuum (parallel)

From
Amit Kapila
Date:
On Mon, Nov 4, 2019 at 11:42 PM Andres Freund <andres@anarazel.de> wrote:
>
>
> > The two approaches to solve this problem being discussed in that
> > thread [1] are as follows:
> > (a) Allow the parallel workers and master backend to have a shared
> > view of vacuum cost related parameters (mainly VacuumCostBalance) and
> > allow each worker to update it and then based on that decide whether
> > it needs to sleep.  Sawada-San has done the POC for this approach.
> > See v32-0004-PoC-shared-vacuum-cost-balance in email [2].  One
> > drawback of this approach could be that we allow the worker to sleep
> > even though the I/O has been performed by some other worker.
>
> I don't understand this drawback.
>

I think the problem could be that the system is not properly throttled
when it is supposed to be.  Let me try by a simple example, say we
have two workers w-1 and w-2.  The w-2 is primarily doing the I/O and
w-1 is doing very less I/O but unfortunately whenever w-1 checks it
finds that cost_limit has exceeded and it goes for sleep, but w-1
still continues.  Now in such a situation even though we have made one
of the workers slept for a required time but ideally the worker which
was doing I/O should have slept.  The aim is to make the system stop
doing I/O whenever the limit has exceeded, so that might not work in
the above situation.

>
> > (b) The other idea could be that we split the I/O among workers
> > something similar to what we do for auto vacuum workers (see
> > autovac_balance_cost).  The basic idea would be that before launching
> > workers, we need to compute the remaining I/O (heap operation would
> > have used something) after which we need to sleep and split it equally
> > across workers.  Here, we are primarily thinking of dividing
> > VacuumCostBalance and VacuumCostLimit parameters.  Once the workers
> > are finished, they need to let master backend know how much I/O they
> > have consumed and then master backend can add it to it's current I/O
> > consumed.  I think we also need to rebalance the cost of remaining
> > workers once some of the worker's exit.  Dilip has prepared a POC
> > patch for this, see 0002-POC-divide-vacuum-cost-limit in email [3].
>
> (b) doesn't strike me as advantageous. It seems quite possible that you
> end up with one worker that has a lot more IO than others, leading to
> unnecessary sleeps, even though the actually available IO budget has not
> been used up.
>

Yeah, this is possible, but to an extent, this is possible in the
current design as well where we balance the cost among autovacuum
workers.  Now, it is quite possible that the current design itself is
not good and we don't want to do the same thing at another place, but
at least we will be consistent and can explain the overall behavior.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: cost based vacuum (parallel)

From
Amit Kapila
Date:
On Mon, Nov 4, 2019 at 11:58 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2019-11-04 12:59:02 -0500, Jeff Janes wrote:
> > On Mon, Nov 4, 2019 at 1:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > For parallel vacuum [1], we were discussing what is the best way to
> > > divide the cost among parallel workers but we didn't get many inputs
> > > apart from people who are very actively involved in patch development.
> > > I feel that we need some more inputs before we finalize anything, so
> > > starting a new thread.
> > >
> >
> > Maybe a I just don't have experience in the type of system that parallel
> > vacuum is needed for, but if there is any meaningful IO throttling which is
> > active, then what is the point of doing the vacuum in parallel in the first
> > place?
>
> I am wondering the same - but to be fair, it's pretty easy to run into
> cases where VACUUM is CPU bound. E.g. because most pages are in
> shared_buffers, and compared to the size of the indexes number of tids
> that need to be pruned is fairly small (also [1]). That means a lot of
> pages need to be scanned, without a whole lot of IO going on. The
> problem with that is just that the defaults for vacuum throttling will
> also apply here, I've never seen anybody tune vacuum_cost_page_hit = 0,
> vacuum_cost_page_dirty=0 or such (in contrast, the latter is the highest
> cost currently).  Nor do we reduce the cost of vacuum_cost_page_dirty
> for unlogged tables.
>
> So while it doesn't seem unreasonable to want to use cost limiting to
> protect against vacuum unexpectedly causing too much, especially read,
> IO, I'm doubtful it has current practical relevance.
>

IIUC, you mean to say that it is of not much practical use to do
parallel vacuum if I/O throttling is enabled for an operation, is that
right?


> I'm wondering how much of the benefit of parallel vacuum really is just
> to work around vacuum ringbuffers often massively hurting performance
> (see e.g. [2]).
>

Yeah, it is a good thing to check, but if anything, I think a parallel
vacuum will further improve the performance with larger ring buffers
as it will make it more CPU bound.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: cost based vacuum (parallel)

From
Amit Kapila
Date:
On Tue, Nov 5, 2019 at 1:12 AM Andres Freund <andres@anarazel.de> wrote:
> On 2019-11-04 14:33:41 -0500, Stephen Frost wrote:
>
> > I've been wondering if the accounting system should consider the cost
> > per tablespace when there's multiple tablespaces involved, instead of
> > throttling the overall process without consideration for the
> > per-tablespace utilization.
>
> This all seems like a feature proposal, or two, independent of the
> patch/question at hand. I think there's a good argument to be had that
> we should severely overhaul the current vacuum cost limiting - it's way
> way too hard to understand the bandwidth that it's allowed to
> consume. But unless one of the proposals makes that measurably harder or
> easier, I think we don't gain anything by entangling an already complex
> patchset with something new.
>

+1.  I think even if we want something related to per-tablespace
costing for vacuum (parallel), it should be done as a separate patch.
It is a whole new area where we need to define what is the appropriate
way to achieve.  It is going to change the current vacuum costing
system in a big way which I don't think is reasonable to do as part of
a parallel vacuum patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: cost based vacuum (parallel)

From
Dilip Kumar
Date:
On Tue, Nov 5, 2019 at 2:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Nov 4, 2019 at 11:58 PM Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On 2019-11-04 12:59:02 -0500, Jeff Janes wrote:
> > > On Mon, Nov 4, 2019 at 1:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > > For parallel vacuum [1], we were discussing what is the best way to
> > > > divide the cost among parallel workers but we didn't get many inputs
> > > > apart from people who are very actively involved in patch development.
> > > > I feel that we need some more inputs before we finalize anything, so
> > > > starting a new thread.
> > > >
> > >
> > > Maybe a I just don't have experience in the type of system that parallel
> > > vacuum is needed for, but if there is any meaningful IO throttling which is
> > > active, then what is the point of doing the vacuum in parallel in the first
> > > place?
> >
> > I am wondering the same - but to be fair, it's pretty easy to run into
> > cases where VACUUM is CPU bound. E.g. because most pages are in
> > shared_buffers, and compared to the size of the indexes number of tids
> > that need to be pruned is fairly small (also [1]). That means a lot of
> > pages need to be scanned, without a whole lot of IO going on. The
> > problem with that is just that the defaults for vacuum throttling will
> > also apply here, I've never seen anybody tune vacuum_cost_page_hit = 0,
> > vacuum_cost_page_dirty=0 or such (in contrast, the latter is the highest
> > cost currently).  Nor do we reduce the cost of vacuum_cost_page_dirty
> > for unlogged tables.
> >
> > So while it doesn't seem unreasonable to want to use cost limiting to
> > protect against vacuum unexpectedly causing too much, especially read,
> > IO, I'm doubtful it has current practical relevance.
> >
>
> IIUC, you mean to say that it is of not much practical use to do
> parallel vacuum if I/O throttling is enabled for an operation, is that
> right?
>
>
> > I'm wondering how much of the benefit of parallel vacuum really is just
> > to work around vacuum ringbuffers often massively hurting performance
> > (see e.g. [2]).
> >
>
> Yeah, it is a good thing to check, but if anything, I think a parallel
> vacuum will further improve the performance with larger ring buffers
> as it will make it more CPU bound.
I have tested the same and the results prove that increasing the ring
buffer size we can see the performance gain.  And, the gain is much
more with the parallel vacuum.

Test case:
create table test(a int, b int, c int, d int, e int, f int, g int, h int);
create index idx1 on test(a);
create index idx2 on test(b);
create index idx3 on test(c);
create index idx4 on test(d);
create index idx5 on test(e);
create index idx6 on test(f);
create index idx7 on test(g);
create index idx8 on test(h);
insert into test select i,i,i,i,i,i,i,i from generate_series(1,1000000) as i;
delete from test where a < 300000;

( I have tested the parallel vacuum and non-parallel vacuum with
different ring buffer size)

8 index
ring buffer size 246kb-> non-parallel: 7.6 seconds   parallel (2
worker): 3.9 seconds
ring buffer size 256mb-> non-parallel: 6.1 seconds   parallel (2
worker): 3.2 seconds

4 index
ring buffer size 246kb -> non-parallel: 4.8 seconds   parallel (2
worker): 3.2 seconds
ring buffer size 256mb -> non-parallel: 3.8 seconds   parallel (2
worker): 2.6 seconds

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: cost based vacuum (parallel)

From
Andres Freund
Date:
Hi,

On November 5, 2019 7:16:41 AM PST, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>On Tue, Nov 5, 2019 at 2:40 PM Amit Kapila <amit.kapila16@gmail.com>
>wrote:
>>
>> On Mon, Nov 4, 2019 at 11:58 PM Andres Freund <andres@anarazel.de>
>wrote:
>> >
>> > Hi,
>> >
>> > On 2019-11-04 12:59:02 -0500, Jeff Janes wrote:
>> > > On Mon, Nov 4, 2019 at 1:54 AM Amit Kapila
><amit.kapila16@gmail.com> wrote:
>> > >
>> > > > For parallel vacuum [1], we were discussing what is the best
>way to
>> > > > divide the cost among parallel workers but we didn't get many
>inputs
>> > > > apart from people who are very actively involved in patch
>development.
>> > > > I feel that we need some more inputs before we finalize
>anything, so
>> > > > starting a new thread.
>> > > >
>> > >
>> > > Maybe a I just don't have experience in the type of system that
>parallel
>> > > vacuum is needed for, but if there is any meaningful IO
>throttling which is
>> > > active, then what is the point of doing the vacuum in parallel in
>the first
>> > > place?
>> >
>> > I am wondering the same - but to be fair, it's pretty easy to run
>into
>> > cases where VACUUM is CPU bound. E.g. because most pages are in
>> > shared_buffers, and compared to the size of the indexes number of
>tids
>> > that need to be pruned is fairly small (also [1]). That means a lot
>of
>> > pages need to be scanned, without a whole lot of IO going on. The
>> > problem with that is just that the defaults for vacuum throttling
>will
>> > also apply here, I've never seen anybody tune vacuum_cost_page_hit
>= 0,
>> > vacuum_cost_page_dirty=0 or such (in contrast, the latter is the
>highest
>> > cost currently).  Nor do we reduce the cost of
>vacuum_cost_page_dirty
>> > for unlogged tables.
>> >
>> > So while it doesn't seem unreasonable to want to use cost limiting
>to
>> > protect against vacuum unexpectedly causing too much, especially
>read,
>> > IO, I'm doubtful it has current practical relevance.
>> >
>>
>> IIUC, you mean to say that it is of not much practical use to do
>> parallel vacuum if I/O throttling is enabled for an operation, is
>that
>> right?
>>
>>
>> > I'm wondering how much of the benefit of parallel vacuum really is
>just
>> > to work around vacuum ringbuffers often massively hurting
>performance
>> > (see e.g. [2]).
>> >
>>
>> Yeah, it is a good thing to check, but if anything, I think a
>parallel
>> vacuum will further improve the performance with larger ring buffers
>> as it will make it more CPU bound.
>I have tested the same and the results prove that increasing the ring
>buffer size we can see the performance gain.  And, the gain is much
>more with the parallel vacuum.
>
>Test case:
>create table test(a int, b int, c int, d int, e int, f int, g int, h
>int);
>create index idx1 on test(a);
>create index idx2 on test(b);
>create index idx3 on test(c);
>create index idx4 on test(d);
>create index idx5 on test(e);
>create index idx6 on test(f);
>create index idx7 on test(g);
>create index idx8 on test(h);
>insert into test select i,i,i,i,i,i,i,i from generate_series(1,1000000)
>as i;
>delete from test where a < 300000;
>
>( I have tested the parallel vacuum and non-parallel vacuum with
>different ring buffer size)

Thanks!

>8 index
>ring buffer size 246kb-> non-parallel: 7.6 seconds   parallel (2
>worker): 3.9 seconds
>ring buffer size 256mb-> non-parallel: 6.1 seconds   parallel (2
>worker): 3.2 seconds
>
>4 index
>ring buffer size 246kb -> non-parallel: 4.8 seconds   parallel (2
>worker): 3.2 seconds
>ring buffer size 256mb -> non-parallel: 3.8 seconds   parallel (2
>worker): 2.6 seconds

What about the case of just disabling the ring buffer logic?

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: cost based vacuum (parallel)

From
Dilip Kumar
Date:
On Tue, Nov 5, 2019 at 8:49 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On November 5, 2019 7:16:41 AM PST, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >On Tue, Nov 5, 2019 at 2:40 PM Amit Kapila <amit.kapila16@gmail.com>
> >wrote:
> >>
> >> On Mon, Nov 4, 2019 at 11:58 PM Andres Freund <andres@anarazel.de>
> >wrote:
> >> >
> >> > Hi,
> >> >
> >> > On 2019-11-04 12:59:02 -0500, Jeff Janes wrote:
> >> > > On Mon, Nov 4, 2019 at 1:54 AM Amit Kapila
> ><amit.kapila16@gmail.com> wrote:
> >> > >
> >> > > > For parallel vacuum [1], we were discussing what is the best
> >way to
> >> > > > divide the cost among parallel workers but we didn't get many
> >inputs
> >> > > > apart from people who are very actively involved in patch
> >development.
> >> > > > I feel that we need some more inputs before we finalize
> >anything, so
> >> > > > starting a new thread.
> >> > > >
> >> > >
> >> > > Maybe a I just don't have experience in the type of system that
> >parallel
> >> > > vacuum is needed for, but if there is any meaningful IO
> >throttling which is
> >> > > active, then what is the point of doing the vacuum in parallel in
> >the first
> >> > > place?
> >> >
> >> > I am wondering the same - but to be fair, it's pretty easy to run
> >into
> >> > cases where VACUUM is CPU bound. E.g. because most pages are in
> >> > shared_buffers, and compared to the size of the indexes number of
> >tids
> >> > that need to be pruned is fairly small (also [1]). That means a lot
> >of
> >> > pages need to be scanned, without a whole lot of IO going on. The
> >> > problem with that is just that the defaults for vacuum throttling
> >will
> >> > also apply here, I've never seen anybody tune vacuum_cost_page_hit
> >= 0,
> >> > vacuum_cost_page_dirty=0 or such (in contrast, the latter is the
> >highest
> >> > cost currently).  Nor do we reduce the cost of
> >vacuum_cost_page_dirty
> >> > for unlogged tables.
> >> >
> >> > So while it doesn't seem unreasonable to want to use cost limiting
> >to
> >> > protect against vacuum unexpectedly causing too much, especially
> >read,
> >> > IO, I'm doubtful it has current practical relevance.
> >> >
> >>
> >> IIUC, you mean to say that it is of not much practical use to do
> >> parallel vacuum if I/O throttling is enabled for an operation, is
> >that
> >> right?
> >>
> >>
> >> > I'm wondering how much of the benefit of parallel vacuum really is
> >just
> >> > to work around vacuum ringbuffers often massively hurting
> >performance
> >> > (see e.g. [2]).
> >> >
> >>
> >> Yeah, it is a good thing to check, but if anything, I think a
> >parallel
> >> vacuum will further improve the performance with larger ring buffers
> >> as it will make it more CPU bound.
> >I have tested the same and the results prove that increasing the ring
> >buffer size we can see the performance gain.  And, the gain is much
> >more with the parallel vacuum.
> >
> >Test case:
> >create table test(a int, b int, c int, d int, e int, f int, g int, h
> >int);
> >create index idx1 on test(a);
> >create index idx2 on test(b);
> >create index idx3 on test(c);
> >create index idx4 on test(d);
> >create index idx5 on test(e);
> >create index idx6 on test(f);
> >create index idx7 on test(g);
> >create index idx8 on test(h);
> >insert into test select i,i,i,i,i,i,i,i from generate_series(1,1000000)
> >as i;
> >delete from test where a < 300000;
> >
> >( I have tested the parallel vacuum and non-parallel vacuum with
> >different ring buffer size)
>
> Thanks!
>
> >8 index
> >ring buffer size 246kb-> non-parallel: 7.6 seconds   parallel (2
> >worker): 3.9 seconds
> >ring buffer size 256mb-> non-parallel: 6.1 seconds   parallel (2
> >worker): 3.2 seconds
> >
> >4 index
> >ring buffer size 246kb -> non-parallel: 4.8 seconds   parallel (2
> >worker): 3.2 seconds
> >ring buffer size 256mb -> non-parallel: 3.8 seconds   parallel (2
> >worker): 2.6 seconds
>
> What about the case of just disabling the ring buffer logic?
>
Repeated the same test by disabling the ring buffer logic.  Results
are almost same as increasing the ring buffer size.

Tested with 4GB shared buffers:

8 index
use shared buffers -> non-parallel: 6.2seconds   parallel (2 worker): 3.3seconds

4 index
use shared buffer -> non-parallel: 3.8seconds   parallel (2 worker): 2.7seconds

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: cost based vacuum (parallel)

From
Amit Kapila
Date:
On Tue, Nov 5, 2019 at 1:42 AM Stephen Frost <sfrost@snowman.net> wrote:
> * Andres Freund (andres@anarazel.de) wrote:
>
> > That's quite doable independent of parallelism, as we don't have tables
> > or indexes spanning more than one tablespace.  True, you could then make
> > the processing of an individual vacuum faster by allowing to utilize
> > multiple tablespace budgets at the same time.
>
> Yes, it's possible to do independent of parallelism, but what I was
> trying to get at above is that it might not be worth the effort.  When
> it comes to parallel vacuum though, I'm not sure that you can just punt
> on this question since you'll naturally end up spanning multiple
> tablespaces concurrently, at least if the heap+indexes are spread across
> multiple tablespaces and you're operating against more than one of those
> relations at a time
>

Each parallel worker operates on a separate index.  It might be worth
exploring per-tablespace vacuum throttling, but that should not be a
requirement for the currently proposed patch.

As per feedback in this thread, it seems that for now, it is better,
if we can allow a parallel vacuum only when I/O throttling is not
enabled.  We can later extend it based on feedback from the field once
the feature starts getting used.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: cost based vacuum (parallel)

From
Andres Freund
Date:
Hi,

On 2019-11-06 07:53:09 +0530, Amit Kapila wrote:
> As per feedback in this thread, it seems that for now, it is better,
> if we can allow a parallel vacuum only when I/O throttling is not
> enabled.  We can later extend it based on feedback from the field once
> the feature starts getting used.

That's not my read on this thread.  I don't think we should introduce
this feature without a solution for the throttling.

Greetings,

Andres Freund



Re: cost based vacuum (parallel)

From
Amit Kapila
Date:
On Wed, Nov 6, 2019 at 7:55 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2019-11-06 07:53:09 +0530, Amit Kapila wrote:
> > As per feedback in this thread, it seems that for now, it is better,
> > if we can allow a parallel vacuum only when I/O throttling is not
> > enabled.  We can later extend it based on feedback from the field once
> > the feature starts getting used.
>
> That's not my read on this thread.  I don't think we should introduce
> this feature without a solution for the throttling.
>

Okay, then I misunderstood your response to Jeff's email [1].  Anyway,
we have already explored two different approaches as mentioned in the
initial email which has somewhat similar results on initial tests.
So, we can explore more on those lines.  Do you any preference or any
other idea?


[1] - https://www.postgresql.org/message-id/20191104182829.57bkz64qn5k3uwc3%40alap3.anarazel.de

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: cost based vacuum (parallel)

From
Stephen Frost
Date:
Greetings,

* Amit Kapila (amit.kapila16@gmail.com) wrote:
> On Tue, Nov 5, 2019 at 1:42 AM Stephen Frost <sfrost@snowman.net> wrote:
> > * Andres Freund (andres@anarazel.de) wrote:
> > > That's quite doable independent of parallelism, as we don't have tables
> > > or indexes spanning more than one tablespace.  True, you could then make
> > > the processing of an individual vacuum faster by allowing to utilize
> > > multiple tablespace budgets at the same time.
> >
> > Yes, it's possible to do independent of parallelism, but what I was
> > trying to get at above is that it might not be worth the effort.  When
> > it comes to parallel vacuum though, I'm not sure that you can just punt
> > on this question since you'll naturally end up spanning multiple
> > tablespaces concurrently, at least if the heap+indexes are spread across
> > multiple tablespaces and you're operating against more than one of those
> > relations at a time
>
> Each parallel worker operates on a separate index.  It might be worth
> exploring per-tablespace vacuum throttling, but that should not be a
> requirement for the currently proposed patch.

Right, that each operates on a separate index in parallel is what I had
figured was probably happening, and that's why I brought up the question
of "well, what does IO throttling mean when you've got multiple
tablespaces involved with presumably independent IO channels...?" (or,
at least, that's what I was trying to go for).

This isn't a question with the current system and way the code works
within a single vacuum operation, as we're never operating on more than
one relation concurrently in that case.

Of course, we don't currently do anything to manage IO utilization
across tablespaces when there are multiple autovacuum workers running
concurrently, which I suppose goes to Andres' point that we aren't
really doing anything to deal with this today and therefore this is
perhaps not all that new of an issue just with the addition of
parallel vacuum.  I'd still argue that it becomes a lot more apparent
when you're talking about one parallel vacuum, but ultimately we should
probably be thinking about how to manage the resources across all the
vacuums and tablespaces and queries and such.

In an ideal world, we'd track the i/o from front-end queries, have some
idea of the total i/o possible for each IO channel, and allow vacuum and
whatever other background processes need to run to scale up and down,
with enough buffer to avoid ever being maxed out on i/o, but keeping up
a consistent rate of i/o that lets everything finish as quickly as
possible.

Thanks,

Stephen

Attachment

Re: cost based vacuum (parallel)

From
Amit Kapila
Date:
On Tue, Nov 5, 2019 at 11:28 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Nov 4, 2019 at 11:42 PM Andres Freund <andres@anarazel.de> wrote:
> >
> >
> > > The two approaches to solve this problem being discussed in that
> > > thread [1] are as follows:
> > > (a) Allow the parallel workers and master backend to have a shared
> > > view of vacuum cost related parameters (mainly VacuumCostBalance) and
> > > allow each worker to update it and then based on that decide whether
> > > it needs to sleep.  Sawada-San has done the POC for this approach.
> > > See v32-0004-PoC-shared-vacuum-cost-balance in email [2].  One
> > > drawback of this approach could be that we allow the worker to sleep
> > > even though the I/O has been performed by some other worker.
> >
> > I don't understand this drawback.
> >
>
> I think the problem could be that the system is not properly throttled
> when it is supposed to be.  Let me try by a simple example, say we
> have two workers w-1 and w-2.  The w-2 is primarily doing the I/O and
> w-1 is doing very less I/O but unfortunately whenever w-1 checks it
> finds that cost_limit has exceeded and it goes for sleep, but w-1
> still continues.
>

Typo in the above sentence.  /but w-1 still continues/but w-2 still continues.

>  Now in such a situation even though we have made one
> of the workers slept for a required time but ideally the worker which
> was doing I/O should have slept.  The aim is to make the system stop
> doing I/O whenever the limit has exceeded, so that might not work in
> the above situation.
>

One idea to fix this drawback is that if we somehow avoid letting the
workers sleep which has done less or no I/O as compared to other
workers, then we can to a good extent ensure that workers which are
doing more I/O will be throttled more.  What we can do is to allow any
worker sleep only if it has performed the I/O above a certain
threshold and the overall balance is more than the cost_limit set by
the system.  Then we will allow the worker to sleep proportional to
the work done by it and reduce the VacuumSharedCostBalance by the
amount which is consumed by the current worker.  Something like:

If ( VacuumSharedCostBalance >= VacuumCostLimit &&
    MyCostBalance > (threshold) VacuumCostLimit / workers)
{
VacuumSharedCostBalance -= MyCostBalance;
Sleep (delay * MyCostBalance/VacuumSharedCostBalance)
}

Assume threshold be 0.5, what that means is, if it has done work more
than 50% of what is expected from this worker and the overall share
cost balance is exceeded, then we will consider this worker to sleep.

What do you guys think?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: cost based vacuum (parallel)

From
Dilip Kumar
Date:
On Wed, Nov 6, 2019 at 9:21 AM Stephen Frost <sfrost@snowman.net> wrote:
>
> Greetings,
>
> * Amit Kapila (amit.kapila16@gmail.com) wrote:
> > On Tue, Nov 5, 2019 at 1:42 AM Stephen Frost <sfrost@snowman.net> wrote:
> > > * Andres Freund (andres@anarazel.de) wrote:
> > > > That's quite doable independent of parallelism, as we don't have tables
> > > > or indexes spanning more than one tablespace.  True, you could then make
> > > > the processing of an individual vacuum faster by allowing to utilize
> > > > multiple tablespace budgets at the same time.
> > >
> > > Yes, it's possible to do independent of parallelism, but what I was
> > > trying to get at above is that it might not be worth the effort.  When
> > > it comes to parallel vacuum though, I'm not sure that you can just punt
> > > on this question since you'll naturally end up spanning multiple
> > > tablespaces concurrently, at least if the heap+indexes are spread across
> > > multiple tablespaces and you're operating against more than one of those
> > > relations at a time
> >
> > Each parallel worker operates on a separate index.  It might be worth
> > exploring per-tablespace vacuum throttling, but that should not be a
> > requirement for the currently proposed patch.
>
> Right, that each operates on a separate index in parallel is what I had
> figured was probably happening, and that's why I brought up the question
> of "well, what does IO throttling mean when you've got multiple
> tablespaces involved with presumably independent IO channels...?" (or,
> at least, that's what I was trying to go for).
>
> This isn't a question with the current system and way the code works
> within a single vacuum operation, as we're never operating on more than
> one relation concurrently in that case.
>
> Of course, we don't currently do anything to manage IO utilization
> across tablespaces when there are multiple autovacuum workers running
> concurrently, which I suppose goes to Andres' point that we aren't
> really doing anything to deal with this today and therefore this is
> perhaps not all that new of an issue just with the addition of
> parallel vacuum.  I'd still argue that it becomes a lot more apparent
> when you're talking about one parallel vacuum, but ultimately we should
> probably be thinking about how to manage the resources across all the
> vacuums and tablespaces and queries and such.
>
> In an ideal world, we'd track the i/o from front-end queries, have some
> idea of the total i/o possible for each IO channel, and allow vacuum and
> whatever other background processes need to run to scale up and down,
> with enough buffer to avoid ever being maxed out on i/o, but keeping up
> a consistent rate of i/o that lets everything finish as quickly as
> possible.

IMHO, in future suppose we improve the I/O throttling for each
tablespace, maybe by maintaining the independent balance for relation
and each index of the relation or may be combined balance for the
indexes which are on the same tablespace.  And, the balance can be
checked against its tablespace i/o limit.  So If we get such a
mechanism in the future then it seems that it will be easily
expandable to the parallel vacuum, isn't it?  Because across workers
also we can track tablespace wise shared balance (if we go with the
shared costing approach for example).

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: cost based vacuum (parallel)

From
Masahiko Sawada
Date:
On Wed, 6 Nov 2019 at 15:45, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Nov 5, 2019 at 11:28 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Nov 4, 2019 at 11:42 PM Andres Freund <andres@anarazel.de> wrote:
> > >
> > >
> > > > The two approaches to solve this problem being discussed in that
> > > > thread [1] are as follows:
> > > > (a) Allow the parallel workers and master backend to have a shared
> > > > view of vacuum cost related parameters (mainly VacuumCostBalance) and
> > > > allow each worker to update it and then based on that decide whether
> > > > it needs to sleep.  Sawada-San has done the POC for this approach.
> > > > See v32-0004-PoC-shared-vacuum-cost-balance in email [2].  One
> > > > drawback of this approach could be that we allow the worker to sleep
> > > > even though the I/O has been performed by some other worker.
> > >
> > > I don't understand this drawback.
> > >
> >
> > I think the problem could be that the system is not properly throttled
> > when it is supposed to be.  Let me try by a simple example, say we
> > have two workers w-1 and w-2.  The w-2 is primarily doing the I/O and
> > w-1 is doing very less I/O but unfortunately whenever w-1 checks it
> > finds that cost_limit has exceeded and it goes for sleep, but w-1
> > still continues.
> >
>
> Typo in the above sentence.  /but w-1 still continues/but w-2 still continues.
>
> >  Now in such a situation even though we have made one
> > of the workers slept for a required time but ideally the worker which
> > was doing I/O should have slept.  The aim is to make the system stop
> > doing I/O whenever the limit has exceeded, so that might not work in
> > the above situation.
> >
>
> One idea to fix this drawback is that if we somehow avoid letting the
> workers sleep which has done less or no I/O as compared to other
> workers, then we can to a good extent ensure that workers which are
> doing more I/O will be throttled more.  What we can do is to allow any
> worker sleep only if it has performed the I/O above a certain
> threshold and the overall balance is more than the cost_limit set by
> the system.  Then we will allow the worker to sleep proportional to
> the work done by it and reduce the VacuumSharedCostBalance by the
> amount which is consumed by the current worker.  Something like:
>
> If ( VacuumSharedCostBalance >= VacuumCostLimit &&
>     MyCostBalance > (threshold) VacuumCostLimit / workers)
> {
> VacuumSharedCostBalance -= MyCostBalance;
> Sleep (delay * MyCostBalance/VacuumSharedCostBalance)
> }
>
> Assume threshold be 0.5, what that means is, if it has done work more
> than 50% of what is expected from this worker and the overall share
> cost balance is exceeded, then we will consider this worker to sleep.
>
> What do you guys think?

I think the idea that the more consuming I/O they sleep more longer
time seems good. There seems not to be the drawback of approach(b)
that is to unnecessarily delay vacuum if some indexes are very small
or bulk-deletions of indexes does almost nothing such as brin. But on
the other hand it's possible that workers don't sleep even if shared
cost balance already exceeds the limit because it's necessary for
sleeping that local balance exceeds the worker's limit divided by the
number of workers. For example, a worker is scheduled doing I/O and
exceeds the limit substantially while other 2 workers do less I/O. And
then the 2 workers are scheduled and consume I/O. The total cost
balance already exceeds the limit but the workers will not sleep as
long as the local balance is less than (limit / # of workers).

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: cost based vacuum (parallel)

From
Amit Kapila
Date:
On Fri, Nov 8, 2019 at 8:18 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Wed, 6 Nov 2019 at 15:45, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Nov 5, 2019 at 11:28 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Mon, Nov 4, 2019 at 11:42 PM Andres Freund <andres@anarazel.de> wrote:
> > > >
> > > >
> > > > > The two approaches to solve this problem being discussed in that
> > > > > thread [1] are as follows:
> > > > > (a) Allow the parallel workers and master backend to have a shared
> > > > > view of vacuum cost related parameters (mainly VacuumCostBalance) and
> > > > > allow each worker to update it and then based on that decide whether
> > > > > it needs to sleep.  Sawada-San has done the POC for this approach.
> > > > > See v32-0004-PoC-shared-vacuum-cost-balance in email [2].  One
> > > > > drawback of this approach could be that we allow the worker to sleep
> > > > > even though the I/O has been performed by some other worker.
> > > >
> > > > I don't understand this drawback.
> > > >
> > >
> > > I think the problem could be that the system is not properly throttled
> > > when it is supposed to be.  Let me try by a simple example, say we
> > > have two workers w-1 and w-2.  The w-2 is primarily doing the I/O and
> > > w-1 is doing very less I/O but unfortunately whenever w-1 checks it
> > > finds that cost_limit has exceeded and it goes for sleep, but w-1
> > > still continues.
> > >
> >
> > Typo in the above sentence.  /but w-1 still continues/but w-2 still continues.
> >
> > >  Now in such a situation even though we have made one
> > > of the workers slept for a required time but ideally the worker which
> > > was doing I/O should have slept.  The aim is to make the system stop
> > > doing I/O whenever the limit has exceeded, so that might not work in
> > > the above situation.
> > >
> >
> > One idea to fix this drawback is that if we somehow avoid letting the
> > workers sleep which has done less or no I/O as compared to other
> > workers, then we can to a good extent ensure that workers which are
> > doing more I/O will be throttled more.  What we can do is to allow any
> > worker sleep only if it has performed the I/O above a certain
> > threshold and the overall balance is more than the cost_limit set by
> > the system.  Then we will allow the worker to sleep proportional to
> > the work done by it and reduce the VacuumSharedCostBalance by the
> > amount which is consumed by the current worker.  Something like:
> >
> > If ( VacuumSharedCostBalance >= VacuumCostLimit &&
> >     MyCostBalance > (threshold) VacuumCostLimit / workers)
> > {
> > VacuumSharedCostBalance -= MyCostBalance;
> > Sleep (delay * MyCostBalance/VacuumSharedCostBalance)
> > }
> >
> > Assume threshold be 0.5, what that means is, if it has done work more
> > than 50% of what is expected from this worker and the overall share
> > cost balance is exceeded, then we will consider this worker to sleep.
> >
> > What do you guys think?
>
> I think the idea that the more consuming I/O they sleep more longer
> time seems good. There seems not to be the drawback of approach(b)
> that is to unnecessarily delay vacuum if some indexes are very small
> or bulk-deletions of indexes does almost nothing such as brin. But on
> the other hand it's possible that workers don't sleep even if shared
> cost balance already exceeds the limit because it's necessary for
> sleeping that local balance exceeds the worker's limit divided by the
> number of workers. For example, a worker is scheduled doing I/O and
> exceeds the limit substantially while other 2 workers do less I/O. And
> then the 2 workers are scheduled and consume I/O. The total cost
> balance already exceeds the limit but the workers will not sleep as
> long as the local balance is less than (limit / # of workers).
>

Right, this is the reason I told to keep some threshold for local
balance(say 50% of (limit / # of workers)).  I think we need to do
some experiments to see what is the best thing to do.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: cost based vacuum (parallel)

From
Dilip Kumar
Date:
On Fri, Nov 8, 2019 at 8:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Fri, Nov 8, 2019 at 8:18 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Wed, 6 Nov 2019 at 15:45, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Nov 5, 2019 at 11:28 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Mon, Nov 4, 2019 at 11:42 PM Andres Freund <andres@anarazel.de> wrote:
> > > > >
> > > > >
> > > > > > The two approaches to solve this problem being discussed in that
> > > > > > thread [1] are as follows:
> > > > > > (a) Allow the parallel workers and master backend to have a shared
> > > > > > view of vacuum cost related parameters (mainly VacuumCostBalance) and
> > > > > > allow each worker to update it and then based on that decide whether
> > > > > > it needs to sleep.  Sawada-San has done the POC for this approach.
> > > > > > See v32-0004-PoC-shared-vacuum-cost-balance in email [2].  One
> > > > > > drawback of this approach could be that we allow the worker to sleep
> > > > > > even though the I/O has been performed by some other worker.
> > > > >
> > > > > I don't understand this drawback.
> > > > >
> > > >
> > > > I think the problem could be that the system is not properly throttled
> > > > when it is supposed to be.  Let me try by a simple example, say we
> > > > have two workers w-1 and w-2.  The w-2 is primarily doing the I/O and
> > > > w-1 is doing very less I/O but unfortunately whenever w-1 checks it
> > > > finds that cost_limit has exceeded and it goes for sleep, but w-1
> > > > still continues.
> > > >
> > >
> > > Typo in the above sentence.  /but w-1 still continues/but w-2 still continues.
> > >
> > > >  Now in such a situation even though we have made one
> > > > of the workers slept for a required time but ideally the worker which
> > > > was doing I/O should have slept.  The aim is to make the system stop
> > > > doing I/O whenever the limit has exceeded, so that might not work in
> > > > the above situation.
> > > >
> > >
> > > One idea to fix this drawback is that if we somehow avoid letting the
> > > workers sleep which has done less or no I/O as compared to other
> > > workers, then we can to a good extent ensure that workers which are
> > > doing more I/O will be throttled more.  What we can do is to allow any
> > > worker sleep only if it has performed the I/O above a certain
> > > threshold and the overall balance is more than the cost_limit set by
> > > the system.  Then we will allow the worker to sleep proportional to
> > > the work done by it and reduce the VacuumSharedCostBalance by the
> > > amount which is consumed by the current worker.  Something like:
> > >
> > > If ( VacuumSharedCostBalance >= VacuumCostLimit &&
> > >     MyCostBalance > (threshold) VacuumCostLimit / workers)
> > > {
> > > VacuumSharedCostBalance -= MyCostBalance;
> > > Sleep (delay * MyCostBalance/VacuumSharedCostBalance)
> > > }
> > >
> > > Assume threshold be 0.5, what that means is, if it has done work more
> > > than 50% of what is expected from this worker and the overall share
> > > cost balance is exceeded, then we will consider this worker to sleep.
> > >
> > > What do you guys think?
> >
> > I think the idea that the more consuming I/O they sleep more longer
> > time seems good. There seems not to be the drawback of approach(b)
> > that is to unnecessarily delay vacuum if some indexes are very small
> > or bulk-deletions of indexes does almost nothing such as brin. But on
> > the other hand it's possible that workers don't sleep even if shared
> > cost balance already exceeds the limit because it's necessary for
> > sleeping that local balance exceeds the worker's limit divided by the
> > number of workers. For example, a worker is scheduled doing I/O and
> > exceeds the limit substantially while other 2 workers do less I/O. And
> > then the 2 workers are scheduled and consume I/O. The total cost
> > balance already exceeds the limit but the workers will not sleep as
> > long as the local balance is less than (limit / # of workers).
> >
>
> Right, this is the reason I told to keep some threshold for local
> balance(say 50% of (limit / # of workers)).  I think we need to do
> some experiments to see what is the best thing to do.
>
I have done some experiments on this line.  I have first produced a
case where we can show the problem with the existing shared costing
patch (worker which is doing less I/O might pay the penalty on behalf
of the worker who is doing more I/O).  I have also hacked the shared
costing patch of Swada-san so that worker only go for sleep if the
shared balance has crossed the limit and it's local balance has
crossed some threadshold[1].

Test setup:  I have created 4 indexes on the table.  Out of which 3
indexes will have a lot of pages to process but need to dirty a few
pages whereas the 4th index will have to process a very less number of
pages but need to dirty all of them.  I have attached the test script
along with the mail.  I have shown what is the delay time each worker
have done.  What is total I/O[1] each worker and what is the page hit,
page miss and page dirty count?
[1] total I/O = _nhit * VacuumCostPageHit + _nmiss *
VacuumCostPageMiss + _ndirty * VacuumCostPageDirty

patch 1: Shared costing patch:  (delay condition ->
VacuumSharedCostBalance > VacuumCostLimit)
worker 0 delay=80.00   total I/O=17931 hit=17891 miss=0 dirty=2
worker 1 delay=40.00   total I/O=17931 hit=17891 miss=0 dirty=2
worker 2 delay=110.00 total I/O=17931 hit=17891 miss=0 dirty=2
worker 3 delay=120.98 total I/O=16378 hit=4318   miss=0 dirty=603

Observation1:  I think here it's clearly visible that worker 3 is
doing the least total I/O but delaying for maximum amount of time.
OTOH, worker 1 is delaying for very little time compared to how much
I/O it is doing.  So for solving this problem, I have add a small
tweak to the patch.  Wherein the worker will only sleep if its local
balance has crossed some threshold.  And, we can see that with that
change the problem is solved up to quite an extent.

patch 2: Shared costing patch: (delay condition ->
VacuumSharedCostBalance > VacuumCostLimit && VacuumLocalBalance >
VacuumCostLimit/number of workers)
worker 0 delay=100.12 total I/O=17931 hit=17891 miss=0 dirty=2
worker 1 delay=90.00   total I/O=17931 hit=17891 miss=0 dirty=2
worker 2 delay=80.06   total I/O=17931 hit=17891 miss=0 dirty=2
worker 3 delay=80.72   total I/O=16378 hit=4318 miss=0 dirty=603

Observation2:  This patch solves the problem discussed with patch1 but
in some extreme cases there is a possibility that the shared limit can
become twice as much as local limit and still no worker goes for the
delay.  For solving that there could be multiple ideas a) Set the max
limit on shared balance e.g. 1.5 * VacuumCostLimit after that we will
give the delay whoever tries to do the I/O irrespective of its local
balance.
b) Set a little lower value for the local threshold e.g 50% of the  local limit

Here I have changed the patch2 as per (b) If local balance reaches to
50% of the local limit and shared balance hit the vacuum cost limit
then go for the delay.

patch 3: Shared costing patch: (delay condition ->
VacuumSharedCostBalance > VacuumCostLimit && VacuumLocalBalance > 0.5
* VacuumCostLimit/number of workers)
worker 0 delay=70.03   total I/O=17931 hit=17891 miss=0 dirty=2
worker 1 delay=100.14 total I/O=17931 hit=17891 miss=0 dirty=2
worker 2 delay=80.01   total I/O=17931 hit=17891 miss=0 dirty=2
worker 3 delay=101.03 total I/O=16378 hit=4318 miss=0 dirty=603

Observation3:  I think patch3 doesn't completely solve the issue
discussed in patch1 but its far better than patch1.  But, patch 2
might have another problem as discussed in observation2.

I think I need to do some more analysis and experiment before we can
reach to some conclusion.  But, one point is clear that we need to do
something to solve the problem observed with patch1 if we are going
with the shared costing approach.


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: cost based vacuum (parallel)

From
Amit Kapila
Date:
On Fri, Nov 8, 2019 at 9:39 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> I have done some experiments on this line.  I have first produced a
> case where we can show the problem with the existing shared costing
> patch (worker which is doing less I/O might pay the penalty on behalf
> of the worker who is doing more I/O).  I have also hacked the shared
> costing patch of Swada-san so that worker only go for sleep if the
> shared balance has crossed the limit and it's local balance has
> crossed some threadshold[1].
>
> Test setup:  I have created 4 indexes on the table.  Out of which 3
> indexes will have a lot of pages to process but need to dirty a few
> pages whereas the 4th index will have to process a very less number of
> pages but need to dirty all of them.  I have attached the test script
> along with the mail.  I have shown what is the delay time each worker
> have done.  What is total I/O[1] each worker and what is the page hit,
> page miss and page dirty count?
> [1] total I/O = _nhit * VacuumCostPageHit + _nmiss *
> VacuumCostPageMiss + _ndirty * VacuumCostPageDirty
>
> patch 1: Shared costing patch:  (delay condition ->
> VacuumSharedCostBalance > VacuumCostLimit)
> worker 0 delay=80.00   total I/O=17931 hit=17891 miss=0 dirty=2
> worker 1 delay=40.00   total I/O=17931 hit=17891 miss=0 dirty=2
> worker 2 delay=110.00 total I/O=17931 hit=17891 miss=0 dirty=2
> worker 3 delay=120.98 total I/O=16378 hit=4318   miss=0 dirty=603
>
> Observation1:  I think here it's clearly visible that worker 3 is
> doing the least total I/O but delaying for maximum amount of time.
> OTOH, worker 1 is delaying for very little time compared to how much
> I/O it is doing.  So for solving this problem, I have add a small
> tweak to the patch.  Wherein the worker will only sleep if its local
> balance has crossed some threshold.  And, we can see that with that
> change the problem is solved up to quite an extent.
>
> patch 2: Shared costing patch: (delay condition ->
> VacuumSharedCostBalance > VacuumCostLimit && VacuumLocalBalance >
> VacuumCostLimit/number of workers)
> worker 0 delay=100.12 total I/O=17931 hit=17891 miss=0 dirty=2
> worker 1 delay=90.00   total I/O=17931 hit=17891 miss=0 dirty=2
> worker 2 delay=80.06   total I/O=17931 hit=17891 miss=0 dirty=2
> worker 3 delay=80.72   total I/O=16378 hit=4318 miss=0 dirty=603
>
> Observation2:  This patch solves the problem discussed with patch1 but
> in some extreme cases there is a possibility that the shared limit can
> become twice as much as local limit and still no worker goes for the
> delay.  For solving that there could be multiple ideas a) Set the max
> limit on shared balance e.g. 1.5 * VacuumCostLimit after that we will
> give the delay whoever tries to do the I/O irrespective of its local
> balance.
> b) Set a little lower value for the local threshold e.g 50% of the  local limit
>
> Here I have changed the patch2 as per (b) If local balance reaches to
> 50% of the local limit and shared balance hit the vacuum cost limit
> then go for the delay.
>
> patch 3: Shared costing patch: (delay condition ->
> VacuumSharedCostBalance > VacuumCostLimit && VacuumLocalBalance > 0.5
> * VacuumCostLimit/number of workers)
> worker 0 delay=70.03   total I/O=17931 hit=17891 miss=0 dirty=2
> worker 1 delay=100.14 total I/O=17931 hit=17891 miss=0 dirty=2
> worker 2 delay=80.01   total I/O=17931 hit=17891 miss=0 dirty=2
> worker 3 delay=101.03 total I/O=16378 hit=4318 miss=0 dirty=603
>
> Observation3:  I think patch3 doesn't completely solve the issue
> discussed in patch1 but its far better than patch1.
>

Yeah, I think it is difficult to get the exact balance, but we can try
to be as close as possible.  We can try to play with the threshold and
another possibility is to try to sleep in proportion to the amount of
I/O done by the worker.

Thanks for doing these experiments, but I think it is better if you
can share the modified patches so that others can also reproduce what
you are seeing.  There is no need to post the entire parallel vacuum
patch-set, but the costing related patch can be posted with a
reference to what all patches are required from parallel vacuum
thread.  Another option is to move this discussion to the parallel
vacuum thread, but I think it is better to decide the costing model
here.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: cost based vacuum (parallel)

From
Dilip Kumar
Date:
On Fri, Nov 8, 2019 at 11:49 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Nov 8, 2019 at 9:39 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > I have done some experiments on this line.  I have first produced a
> > case where we can show the problem with the existing shared costing
> > patch (worker which is doing less I/O might pay the penalty on behalf
> > of the worker who is doing more I/O).  I have also hacked the shared
> > costing patch of Swada-san so that worker only go for sleep if the
> > shared balance has crossed the limit and it's local balance has
> > crossed some threadshold[1].
> >
> > Test setup:  I have created 4 indexes on the table.  Out of which 3
> > indexes will have a lot of pages to process but need to dirty a few
> > pages whereas the 4th index will have to process a very less number of
> > pages but need to dirty all of them.  I have attached the test script
> > along with the mail.  I have shown what is the delay time each worker
> > have done.  What is total I/O[1] each worker and what is the page hit,
> > page miss and page dirty count?
> > [1] total I/O = _nhit * VacuumCostPageHit + _nmiss *
> > VacuumCostPageMiss + _ndirty * VacuumCostPageDirty
> >
> > patch 1: Shared costing patch:  (delay condition ->
> > VacuumSharedCostBalance > VacuumCostLimit)
> > worker 0 delay=80.00   total I/O=17931 hit=17891 miss=0 dirty=2
> > worker 1 delay=40.00   total I/O=17931 hit=17891 miss=0 dirty=2
> > worker 2 delay=110.00 total I/O=17931 hit=17891 miss=0 dirty=2
> > worker 3 delay=120.98 total I/O=16378 hit=4318   miss=0 dirty=603
> >
> > Observation1:  I think here it's clearly visible that worker 3 is
> > doing the least total I/O but delaying for maximum amount of time.
> > OTOH, worker 1 is delaying for very little time compared to how much
> > I/O it is doing.  So for solving this problem, I have add a small
> > tweak to the patch.  Wherein the worker will only sleep if its local
> > balance has crossed some threshold.  And, we can see that with that
> > change the problem is solved up to quite an extent.
> >
> > patch 2: Shared costing patch: (delay condition ->
> > VacuumSharedCostBalance > VacuumCostLimit && VacuumLocalBalance >
> > VacuumCostLimit/number of workers)
> > worker 0 delay=100.12 total I/O=17931 hit=17891 miss=0 dirty=2
> > worker 1 delay=90.00   total I/O=17931 hit=17891 miss=0 dirty=2
> > worker 2 delay=80.06   total I/O=17931 hit=17891 miss=0 dirty=2
> > worker 3 delay=80.72   total I/O=16378 hit=4318 miss=0 dirty=603
> >
> > Observation2:  This patch solves the problem discussed with patch1 but
> > in some extreme cases there is a possibility that the shared limit can
> > become twice as much as local limit and still no worker goes for the
> > delay.  For solving that there could be multiple ideas a) Set the max
> > limit on shared balance e.g. 1.5 * VacuumCostLimit after that we will
> > give the delay whoever tries to do the I/O irrespective of its local
> > balance.
> > b) Set a little lower value for the local threshold e.g 50% of the  local limit
> >
> > Here I have changed the patch2 as per (b) If local balance reaches to
> > 50% of the local limit and shared balance hit the vacuum cost limit
> > then go for the delay.
> >
> > patch 3: Shared costing patch: (delay condition ->
> > VacuumSharedCostBalance > VacuumCostLimit && VacuumLocalBalance > 0.5
> > * VacuumCostLimit/number of workers)
> > worker 0 delay=70.03   total I/O=17931 hit=17891 miss=0 dirty=2
> > worker 1 delay=100.14 total I/O=17931 hit=17891 miss=0 dirty=2
> > worker 2 delay=80.01   total I/O=17931 hit=17891 miss=0 dirty=2
> > worker 3 delay=101.03 total I/O=16378 hit=4318 miss=0 dirty=603
> >
> > Observation3:  I think patch3 doesn't completely solve the issue
> > discussed in patch1 but its far better than patch1.
> >
>
> Yeah, I think it is difficult to get the exact balance, but we can try
> to be as close as possible.  We can try to play with the threshold and
> another possibility is to try to sleep in proportion to the amount of
> I/O done by the worker.
I have done another experiment where I have done another 2 changes on
top op patch3
a) Only reduce the local balance from the total shared balance
whenever it's applying delay
b) Compute the delay based on the local balance.

patch4:
worker 0 delay=84.130000 total I/O=17931 hit=17891 miss=0 dirty=2
worker 1 delay=89.230000 total I/O=17931 hit=17891 miss=0 dirty=2
worker 2 delay=88.680000 total I/O=17931 hit=17891 miss=0 dirty=2
worker 3 delay=80.790000 total I/O=16378 hit=4318 miss=0 dirty=603

I think with this approach the delay is divided among the worker quite
well compared to other approaches

>
> Thanks for doing these experiments, but I think it is better if you
> can share the modified patches so that others can also reproduce what
> you are seeing.  There is no need to post the entire parallel vacuum
> patch-set, but the costing related patch can be posted with a
> reference to what all patches are required from parallel vacuum
> thread.  Another option is to move this discussion to the parallel
> vacuum thread, but I think it is better to decide the costing model
> here.

I have attached the POC patches I have for testing.  Step for testing
1. First, apply the parallel vacuum base patch and the shared costing patch[1].
2. Apply 0001-vacuum_costing_test.patch attached in the mail
3. Run the script shared in previous mail [2]. --> this will give the
results for patch 1 shared upthread[2]
4. Apply patch shared_costing_plus_patch[2] or [3] or [4] to see the
results with different approaches explained in the mail.


[1] https://www.postgresql.org/message-id/CAD21AoAqT17QwKJ_sWOqRxNvg66wMw1oZZzf9Rt-E-zD%2BXOh_Q%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAFiTN-tFLN%3Dvdu5Ra-23E9_7Z1JXkk5MkRY3Bkj2zAoWK7fULA%40mail.gmail.com

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: cost based vacuum (parallel)

From
Dilip Kumar
Date:
On Mon, Nov 11, 2019 at 9:43 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Nov 8, 2019 at 11:49 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Nov 8, 2019 at 9:39 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > I have done some experiments on this line.  I have first produced a
> > > case where we can show the problem with the existing shared costing
> > > patch (worker which is doing less I/O might pay the penalty on behalf
> > > of the worker who is doing more I/O).  I have also hacked the shared
> > > costing patch of Swada-san so that worker only go for sleep if the
> > > shared balance has crossed the limit and it's local balance has
> > > crossed some threadshold[1].
> > >
> > > Test setup:  I have created 4 indexes on the table.  Out of which 3
> > > indexes will have a lot of pages to process but need to dirty a few
> > > pages whereas the 4th index will have to process a very less number of
> > > pages but need to dirty all of them.  I have attached the test script
> > > along with the mail.  I have shown what is the delay time each worker
> > > have done.  What is total I/O[1] each worker and what is the page hit,
> > > page miss and page dirty count?
> > > [1] total I/O = _nhit * VacuumCostPageHit + _nmiss *
> > > VacuumCostPageMiss + _ndirty * VacuumCostPageDirty
> > >
> > > patch 1: Shared costing patch:  (delay condition ->
> > > VacuumSharedCostBalance > VacuumCostLimit)
> > > worker 0 delay=80.00   total I/O=17931 hit=17891 miss=0 dirty=2
> > > worker 1 delay=40.00   total I/O=17931 hit=17891 miss=0 dirty=2
> > > worker 2 delay=110.00 total I/O=17931 hit=17891 miss=0 dirty=2
> > > worker 3 delay=120.98 total I/O=16378 hit=4318   miss=0 dirty=603
> > >
> > > Observation1:  I think here it's clearly visible that worker 3 is
> > > doing the least total I/O but delaying for maximum amount of time.
> > > OTOH, worker 1 is delaying for very little time compared to how much
> > > I/O it is doing.  So for solving this problem, I have add a small
> > > tweak to the patch.  Wherein the worker will only sleep if its local
> > > balance has crossed some threshold.  And, we can see that with that
> > > change the problem is solved up to quite an extent.
> > >
> > > patch 2: Shared costing patch: (delay condition ->
> > > VacuumSharedCostBalance > VacuumCostLimit && VacuumLocalBalance >
> > > VacuumCostLimit/number of workers)
> > > worker 0 delay=100.12 total I/O=17931 hit=17891 miss=0 dirty=2
> > > worker 1 delay=90.00   total I/O=17931 hit=17891 miss=0 dirty=2
> > > worker 2 delay=80.06   total I/O=17931 hit=17891 miss=0 dirty=2
> > > worker 3 delay=80.72   total I/O=16378 hit=4318 miss=0 dirty=603
> > >
> > > Observation2:  This patch solves the problem discussed with patch1 but
> > > in some extreme cases there is a possibility that the shared limit can
> > > become twice as much as local limit and still no worker goes for the
> > > delay.  For solving that there could be multiple ideas a) Set the max
> > > limit on shared balance e.g. 1.5 * VacuumCostLimit after that we will
> > > give the delay whoever tries to do the I/O irrespective of its local
> > > balance.
> > > b) Set a little lower value for the local threshold e.g 50% of the  local limit
> > >
> > > Here I have changed the patch2 as per (b) If local balance reaches to
> > > 50% of the local limit and shared balance hit the vacuum cost limit
> > > then go for the delay.
> > >
> > > patch 3: Shared costing patch: (delay condition ->
> > > VacuumSharedCostBalance > VacuumCostLimit && VacuumLocalBalance > 0.5
> > > * VacuumCostLimit/number of workers)
> > > worker 0 delay=70.03   total I/O=17931 hit=17891 miss=0 dirty=2
> > > worker 1 delay=100.14 total I/O=17931 hit=17891 miss=0 dirty=2
> > > worker 2 delay=80.01   total I/O=17931 hit=17891 miss=0 dirty=2
> > > worker 3 delay=101.03 total I/O=16378 hit=4318 miss=0 dirty=603
> > >
> > > Observation3:  I think patch3 doesn't completely solve the issue
> > > discussed in patch1 but its far better than patch1.
> > >
> >
> > Yeah, I think it is difficult to get the exact balance, but we can try
> > to be as close as possible.  We can try to play with the threshold and
> > another possibility is to try to sleep in proportion to the amount of
> > I/O done by the worker.
> I have done another experiment where I have done another 2 changes on
> top op patch3
> a) Only reduce the local balance from the total shared balance
> whenever it's applying delay
> b) Compute the delay based on the local balance.
>
> patch4:
> worker 0 delay=84.130000 total I/O=17931 hit=17891 miss=0 dirty=2
> worker 1 delay=89.230000 total I/O=17931 hit=17891 miss=0 dirty=2
> worker 2 delay=88.680000 total I/O=17931 hit=17891 miss=0 dirty=2
> worker 3 delay=80.790000 total I/O=16378 hit=4318 miss=0 dirty=603
>
> I think with this approach the delay is divided among the worker quite
> well compared to other approaches
>
> >
> > Thanks for doing these experiments, but I think it is better if you
> > can share the modified patches so that others can also reproduce what
> > you are seeing.  There is no need to post the entire parallel vacuum
> > patch-set, but the costing related patch can be posted with a
> > reference to what all patches are required from parallel vacuum
> > thread.  Another option is to move this discussion to the parallel
> > vacuum thread, but I think it is better to decide the costing model
> > here.
>
> I have attached the POC patches I have for testing.  Step for testing
> 1. First, apply the parallel vacuum base patch and the shared costing patch[1].
> 2. Apply 0001-vacuum_costing_test.patch attached in the mail
> 3. Run the script shared in previous mail [2]. --> this will give the
> results for patch 1 shared upthread[2]
> 4. Apply patch shared_costing_plus_patch[2] or [3] or [4] to see the
> results with different approaches explained in the mail.
>
>
> [1] https://www.postgresql.org/message-id/CAD21AoAqT17QwKJ_sWOqRxNvg66wMw1oZZzf9Rt-E-zD%2BXOh_Q%40mail.gmail.com
> [2] https://www.postgresql.org/message-id/CAFiTN-tFLN%3Dvdu5Ra-23E9_7Z1JXkk5MkRY3Bkj2zAoWK7fULA%40mail.gmail.com
>
I have tested the same with some other workload(test file attached).
I can see the same behaviour with this workload as well that with the
patch 4 the distribution of the delay is better compared to other
patches i.e. worker with more I/O have more delay and with equal IO
have alsomost equal delay.  Only thing is that the total delay with
the patch 4 is slightly less compared to other pacthes.

patch1:
  worker 0 delay=120.000000 total io=35828 hit=35788 miss=0 dirty=2
  worker 1 delay=170.000000 total io=35828 hit=35788 miss=0 dirty=2
  worker 2 delay=210.000000 total io=35828 hit=35788 miss=0 dirty=2
  worker 3 delay=263.400000 total io=44322 hit=8352 miss=1199 dirty=1199

patch2:
  worker 0 delay=190.645000 total io=35828 hit=35788 miss=0 dirty=2
  worker 1 delay=160.090000 total io=35828 hit=35788 miss=0 dirty=2
  worker 2 delay=170.775000 total io=35828 hit=35788 miss=0 dirty=2
  worker 3 delay=243.180000 total io=44322 hit=8352 miss=1199 dirty=1199

patch3:
  worker 0 delay=191.765000 total io=35828 hit=35788 miss=0 dirty=2
  worker 1 delay=180.935000 total io=35828 hit=35788 miss=0 dirty=2
  worker 2 delay=201.305000 total io=35828 hit=35788 miss=0 dirty=2
  worker 3 delay=192.770000 total io=44322 hit=8352 miss=1199 dirty=1199

patch4:
  worker 0 delay=175.290000 total io=35828 hit=35788 miss=0 dirty=2
  worker 1 delay=174.135000 total io=35828 hit=35788 miss=0 dirty=2
  worker 2 delay=175.560000 total io=35828 hit=35788 miss=0 dirty=2
  worker 3 delay=212.100000 total io=44322 hit=8352 miss=1199 dirty=1199

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: cost based vacuum (parallel)

From
Amit Kapila
Date:
On Mon, Nov 11, 2019 at 12:59 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Nov 11, 2019 at 9:43 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Nov 8, 2019 at 11:49 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > >
> > > Yeah, I think it is difficult to get the exact balance, but we can try
> > > to be as close as possible.  We can try to play with the threshold and
> > > another possibility is to try to sleep in proportion to the amount of
> > > I/O done by the worker.
> > I have done another experiment where I have done another 2 changes on
> > top op patch3
> > a) Only reduce the local balance from the total shared balance
> > whenever it's applying delay
> > b) Compute the delay based on the local balance.
> >
> > patch4:
> > worker 0 delay=84.130000 total I/O=17931 hit=17891 miss=0 dirty=2
> > worker 1 delay=89.230000 total I/O=17931 hit=17891 miss=0 dirty=2
> > worker 2 delay=88.680000 total I/O=17931 hit=17891 miss=0 dirty=2
> > worker 3 delay=80.790000 total I/O=16378 hit=4318 miss=0 dirty=603
> >
> > I think with this approach the delay is divided among the worker quite
> > well compared to other approaches
> >
> > >
..
> I have tested the same with some other workload(test file attached).
> I can see the same behaviour with this workload as well that with the
> patch 4 the distribution of the delay is better compared to other
> patches i.e. worker with more I/O have more delay and with equal IO
> have alsomost equal delay.  Only thing is that the total delay with
> the patch 4 is slightly less compared to other pacthes.
>

I see one problem with the formula you have used in the patch, maybe
that is causing the value of total delay to go down.

- if (new_balance >= VacuumCostLimit)
+ VacuumCostBalanceLocal += VacuumCostBalance;
+ if ((new_balance >= VacuumCostLimit) &&
+ (VacuumCostBalanceLocal > VacuumCostLimit/(0.5 * nworker)))

As per discussion, the second part of the condition should be
"VacuumCostBalanceLocal > (0.5) * VacuumCostLimit/nworker".  I think
you can once change this and try again.  Also, please try with the
different values of threshold (0.3, 0.5, 0.7, etc.).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: cost based vacuum (parallel)

From
Dilip Kumar
Date:
On Mon, Nov 11, 2019 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Nov 11, 2019 at 12:59 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Mon, Nov 11, 2019 at 9:43 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Fri, Nov 8, 2019 at 11:49 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > >
> > > > Yeah, I think it is difficult to get the exact balance, but we can try
> > > > to be as close as possible.  We can try to play with the threshold and
> > > > another possibility is to try to sleep in proportion to the amount of
> > > > I/O done by the worker.
> > > I have done another experiment where I have done another 2 changes on
> > > top op patch3
> > > a) Only reduce the local balance from the total shared balance
> > > whenever it's applying delay
> > > b) Compute the delay based on the local balance.
> > >
> > > patch4:
> > > worker 0 delay=84.130000 total I/O=17931 hit=17891 miss=0 dirty=2
> > > worker 1 delay=89.230000 total I/O=17931 hit=17891 miss=0 dirty=2
> > > worker 2 delay=88.680000 total I/O=17931 hit=17891 miss=0 dirty=2
> > > worker 3 delay=80.790000 total I/O=16378 hit=4318 miss=0 dirty=603
> > >
> > > I think with this approach the delay is divided among the worker quite
> > > well compared to other approaches
> > >
> > > >
> ..
> > I have tested the same with some other workload(test file attached).
> > I can see the same behaviour with this workload as well that with the
> > patch 4 the distribution of the delay is better compared to other
> > patches i.e. worker with more I/O have more delay and with equal IO
> > have alsomost equal delay.  Only thing is that the total delay with
> > the patch 4 is slightly less compared to other pacthes.
> >
>
> I see one problem with the formula you have used in the patch, maybe
> that is causing the value of total delay to go down.
>
> - if (new_balance >= VacuumCostLimit)
> + VacuumCostBalanceLocal += VacuumCostBalance;
> + if ((new_balance >= VacuumCostLimit) &&
> + (VacuumCostBalanceLocal > VacuumCostLimit/(0.5 * nworker)))
>
> As per discussion, the second part of the condition should be
> "VacuumCostBalanceLocal > (0.5) * VacuumCostLimit/nworker".
My Bad
I think
> you can once change this and try again.  Also, please try with the
> different values of threshold (0.3, 0.5, 0.7, etc.).
>
Okay, I will retest with both patch3 and path4 for both the scenarios.
I will also try with different multipliers.

> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: cost based vacuum (parallel)

From
Amit Kapila
Date:
On Mon, Nov 11, 2019 at 5:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Nov 11, 2019 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > ..
> > > I have tested the same with some other workload(test file attached).
> > > I can see the same behaviour with this workload as well that with the
> > > patch 4 the distribution of the delay is better compared to other
> > > patches i.e. worker with more I/O have more delay and with equal IO
> > > have alsomost equal delay.  Only thing is that the total delay with
> > > the patch 4 is slightly less compared to other pacthes.
> > >
> >
> > I see one problem with the formula you have used in the patch, maybe
> > that is causing the value of total delay to go down.
> >
> > - if (new_balance >= VacuumCostLimit)
> > + VacuumCostBalanceLocal += VacuumCostBalance;
> > + if ((new_balance >= VacuumCostLimit) &&
> > + (VacuumCostBalanceLocal > VacuumCostLimit/(0.5 * nworker)))
> >
> > As per discussion, the second part of the condition should be
> > "VacuumCostBalanceLocal > (0.5) * VacuumCostLimit/nworker".
> My Bad
> I think
> > you can once change this and try again.  Also, please try with the
> > different values of threshold (0.3, 0.5, 0.7, etc.).
> >
> Okay, I will retest with both patch3 and path4 for both the scenarios.
> I will also try with different multipliers.
>

One more thing, I think we should also test these cases with a varying
number of indexes (say 2,6,8,etc.) and then probably, we should test
by a varying number of workers where the number of workers are lesser
than indexes.  You can do these after finishing your previous
experiments.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: cost based vacuum (parallel)

From
Dilip Kumar
Date:
On Mon, Nov 11, 2019 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Nov 11, 2019 at 12:59 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Mon, Nov 11, 2019 at 9:43 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Fri, Nov 8, 2019 at 11:49 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > >
> > > > Yeah, I think it is difficult to get the exact balance, but we can try
> > > > to be as close as possible.  We can try to play with the threshold and
> > > > another possibility is to try to sleep in proportion to the amount of
> > > > I/O done by the worker.
> > > I have done another experiment where I have done another 2 changes on
> > > top op patch3
> > > a) Only reduce the local balance from the total shared balance
> > > whenever it's applying delay
> > > b) Compute the delay based on the local balance.
> > >
> > > patch4:
> > > worker 0 delay=84.130000 total I/O=17931 hit=17891 miss=0 dirty=2
> > > worker 1 delay=89.230000 total I/O=17931 hit=17891 miss=0 dirty=2
> > > worker 2 delay=88.680000 total I/O=17931 hit=17891 miss=0 dirty=2
> > > worker 3 delay=80.790000 total I/O=16378 hit=4318 miss=0 dirty=603
> > >
> > > I think with this approach the delay is divided among the worker quite
> > > well compared to other approaches
> > >
> > > >
> ..
> > I have tested the same with some other workload(test file attached).
> > I can see the same behaviour with this workload as well that with the
> > patch 4 the distribution of the delay is better compared to other
> > patches i.e. worker with more I/O have more delay and with equal IO
> > have alsomost equal delay.  Only thing is that the total delay with
> > the patch 4 is slightly less compared to other pacthes.
> >
>
> I see one problem with the formula you have used in the patch, maybe
> that is causing the value of total delay to go down.
>
> - if (new_balance >= VacuumCostLimit)
> + VacuumCostBalanceLocal += VacuumCostBalance;
> + if ((new_balance >= VacuumCostLimit) &&
> + (VacuumCostBalanceLocal > VacuumCostLimit/(0.5 * nworker)))
>
> As per discussion, the second part of the condition should be
> "VacuumCostBalanceLocal > (0.5) * VacuumCostLimit/nworker".  I think
> you can once change this and try again.  Also, please try with the
> different values of threshold (0.3, 0.5, 0.7, etc.).
>
I have modified the patch4 and ran with different values.  But, I
don't see much difference in the values with the patch4.  Infact I
removed the condition for the local balancing check completely still
the delays are the same,  I think this is because with patch4 worker
are only reducing their own balance and also delaying as much as their
local balance.  So maybe the second condition will not have much
impact.

Patch4 (test.sh)
0
  worker 0 delay=82.380000 total io=17931 hit=17891 miss=0 dirty=2
  worker 1 delay=89.370000 total io=17931 hit=17891 miss=0 dirty=2
  worker 2 delay=89.645000 total io=17931 hit=17891 miss=0 dirty=2
  worker 3 delay=79.150000 total io=16378 hit=4318 miss=0 dirty=603

0.1
  worker 0 delay=89.295000 total io=17931 hit=17891 miss=0 dirty=2
  worker 1 delay=89.230000 total io=17931 hit=17891 miss=0 dirty=2
  worker 2 delay=89.675000 total io=17931 hit=17891 miss=0 dirty=2
  worker 3 delay=81.840000 total io=16378 hit=4318 miss=0 dirty=603

0.3
  worker 0 delay=85.915000 total io=17931 hit=17891 miss=0 dirty=2
  worker 1 delay=85.180000 total io=17931 hit=17891 miss=0 dirty=2
  worker 2 delay=88.760000 total io=17931 hit=17891 miss=0 dirty=2
  worker 3 delay=81.975000 total io=16378 hit=4318 miss=0 dirty=603

0.5
  worker 0 delay=81.635000 total io=17931 hit=17891 miss=0 dirty=2
  worker 1 delay=87.490000 total io=17931 hit=17891 miss=0 dirty=2
  worker 2 delay=89.425000 total io=17931 hit=17891 miss=0 dirty=2
  worker 3 delay=82.050000 total io=16378 hit=4318 miss=0 dirty=603

0.7
  worker 0 delay=85.185000 total io=17931 hit=17891 miss=0 dirty=2
  worker 1 delay=88.835000 total io=17931 hit=17891 miss=0 dirty=2
  worker 2 delay=86.005000 total io=17931 hit=17891 miss=0 dirty=2
  worker 3 delay=76.160000 total io=16378 hit=4318 miss=0 dirty=603

Patch4 (test1.sh)
0
  worker 0 delay=179.005000 total io=35828 hit=35788 miss=0 dirty=2
  worker 1 delay=179.010000 total io=35828 hit=35788 miss=0 dirty=2
  worker 2 delay=179.010000 total io=35828 hit=35788 miss=0 dirty=2
  worker 3 delay=221.900000 total io=44322 hit=8352 miss=1199 dirty=1199

0.1
  worker 0 delay=177.840000 total io=35828 hit=35788 miss=0 dirty=2
  worker 1 delay=179.465000 total io=35828 hit=35788 miss=0 dirty=2
  worker 2 delay=179.255000 total io=35828 hit=35788 miss=0 dirty=2
  worker 3 delay=222.695000 total io=44322 hit=8352 miss=1199 dirty=1199

0.3
  worker 0 delay=178.295000 total io=35828 hit=35788 miss=0 dirty=2
  worker 1 delay=178.720000 total io=35828 hit=35788 miss=0 dirty=2
  worker 2 delay=178.270000 total io=35828 hit=35788 miss=0 dirty=2
  worker 3 delay=220.420000 total io=44322 hit=8352 miss=1199 dirty=1199

0.5
  worker 0 delay=178.415000 total io=35828 hit=35788 miss=0 dirty=2
  worker 1 delay=178.385000 total io=35828 hit=35788 miss=0 dirty=2
  worker 2 delay=173.805000 total io=35828 hit=35788 miss=0 dirty=2
  worker 3 delay=221.605000 total io=44322 hit=8352 miss=1199 dirty=1199

0.7
  worker 0 delay=175.330000 total io=35828 hit=35788 miss=0 dirty=2
  worker 1 delay=177.890000 total io=35828 hit=35788 miss=0 dirty=2
  worker 2 delay=167.540000 total io=35828 hit=35788 miss=0 dirty=2
  worker 3 delay=216.725000 total io=44322 hit=8352 miss=1199 dirty=1199


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: cost based vacuum (parallel)

From
Dilip Kumar
Date:
On Tue, Nov 12, 2019 at 10:47 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Nov 11, 2019 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Nov 11, 2019 at 12:59 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Mon, Nov 11, 2019 at 9:43 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Fri, Nov 8, 2019 at 11:49 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > >
> > > > > Yeah, I think it is difficult to get the exact balance, but we can try
> > > > > to be as close as possible.  We can try to play with the threshold and
> > > > > another possibility is to try to sleep in proportion to the amount of
> > > > > I/O done by the worker.
> > > > I have done another experiment where I have done another 2 changes on
> > > > top op patch3
> > > > a) Only reduce the local balance from the total shared balance
> > > > whenever it's applying delay
> > > > b) Compute the delay based on the local balance.
> > > >
> > > > patch4:
> > > > worker 0 delay=84.130000 total I/O=17931 hit=17891 miss=0 dirty=2
> > > > worker 1 delay=89.230000 total I/O=17931 hit=17891 miss=0 dirty=2
> > > > worker 2 delay=88.680000 total I/O=17931 hit=17891 miss=0 dirty=2
> > > > worker 3 delay=80.790000 total I/O=16378 hit=4318 miss=0 dirty=603
> > > >
> > > > I think with this approach the delay is divided among the worker quite
> > > > well compared to other approaches
> > > >
> > > > >
> > ..
> > > I have tested the same with some other workload(test file attached).
> > > I can see the same behaviour with this workload as well that with the
> > > patch 4 the distribution of the delay is better compared to other
> > > patches i.e. worker with more I/O have more delay and with equal IO
> > > have alsomost equal delay.  Only thing is that the total delay with
> > > the patch 4 is slightly less compared to other pacthes.
> > >
> >
> > I see one problem with the formula you have used in the patch, maybe
> > that is causing the value of total delay to go down.
> >
> > - if (new_balance >= VacuumCostLimit)
> > + VacuumCostBalanceLocal += VacuumCostBalance;
> > + if ((new_balance >= VacuumCostLimit) &&
> > + (VacuumCostBalanceLocal > VacuumCostLimit/(0.5 * nworker)))
> >
> > As per discussion, the second part of the condition should be
> > "VacuumCostBalanceLocal > (0.5) * VacuumCostLimit/nworker".  I think
> > you can once change this and try again.  Also, please try with the
> > different values of threshold (0.3, 0.5, 0.7, etc.).
> >
> I have modified the patch4 and ran with different values.  But, I
> don't see much difference in the values with the patch4.  Infact I
> removed the condition for the local balancing check completely still
> the delays are the same,  I think this is because with patch4 worker
> are only reducing their own balance and also delaying as much as their
> local balance.  So maybe the second condition will not have much
> impact.
>
> Patch4 (test.sh)
> 0
>   worker 0 delay=82.380000 total io=17931 hit=17891 miss=0 dirty=2
>   worker 1 delay=89.370000 total io=17931 hit=17891 miss=0 dirty=2
>   worker 2 delay=89.645000 total io=17931 hit=17891 miss=0 dirty=2
>   worker 3 delay=79.150000 total io=16378 hit=4318 miss=0 dirty=603
>
> 0.1
>   worker 0 delay=89.295000 total io=17931 hit=17891 miss=0 dirty=2
>   worker 1 delay=89.230000 total io=17931 hit=17891 miss=0 dirty=2
>   worker 2 delay=89.675000 total io=17931 hit=17891 miss=0 dirty=2
>   worker 3 delay=81.840000 total io=16378 hit=4318 miss=0 dirty=603
>
> 0.3
>   worker 0 delay=85.915000 total io=17931 hit=17891 miss=0 dirty=2
>   worker 1 delay=85.180000 total io=17931 hit=17891 miss=0 dirty=2
>   worker 2 delay=88.760000 total io=17931 hit=17891 miss=0 dirty=2
>   worker 3 delay=81.975000 total io=16378 hit=4318 miss=0 dirty=603
>
> 0.5
>   worker 0 delay=81.635000 total io=17931 hit=17891 miss=0 dirty=2
>   worker 1 delay=87.490000 total io=17931 hit=17891 miss=0 dirty=2
>   worker 2 delay=89.425000 total io=17931 hit=17891 miss=0 dirty=2
>   worker 3 delay=82.050000 total io=16378 hit=4318 miss=0 dirty=603
>
> 0.7
>   worker 0 delay=85.185000 total io=17931 hit=17891 miss=0 dirty=2
>   worker 1 delay=88.835000 total io=17931 hit=17891 miss=0 dirty=2
>   worker 2 delay=86.005000 total io=17931 hit=17891 miss=0 dirty=2
>   worker 3 delay=76.160000 total io=16378 hit=4318 miss=0 dirty=603
>
> Patch4 (test1.sh)
> 0
>   worker 0 delay=179.005000 total io=35828 hit=35788 miss=0 dirty=2
>   worker 1 delay=179.010000 total io=35828 hit=35788 miss=0 dirty=2
>   worker 2 delay=179.010000 total io=35828 hit=35788 miss=0 dirty=2
>   worker 3 delay=221.900000 total io=44322 hit=8352 miss=1199 dirty=1199
>
> 0.1
>   worker 0 delay=177.840000 total io=35828 hit=35788 miss=0 dirty=2
>   worker 1 delay=179.465000 total io=35828 hit=35788 miss=0 dirty=2
>   worker 2 delay=179.255000 total io=35828 hit=35788 miss=0 dirty=2
>   worker 3 delay=222.695000 total io=44322 hit=8352 miss=1199 dirty=1199
>
> 0.3
>   worker 0 delay=178.295000 total io=35828 hit=35788 miss=0 dirty=2
>   worker 1 delay=178.720000 total io=35828 hit=35788 miss=0 dirty=2
>   worker 2 delay=178.270000 total io=35828 hit=35788 miss=0 dirty=2
>   worker 3 delay=220.420000 total io=44322 hit=8352 miss=1199 dirty=1199
>
> 0.5
>   worker 0 delay=178.415000 total io=35828 hit=35788 miss=0 dirty=2
>   worker 1 delay=178.385000 total io=35828 hit=35788 miss=0 dirty=2
>   worker 2 delay=173.805000 total io=35828 hit=35788 miss=0 dirty=2
>   worker 3 delay=221.605000 total io=44322 hit=8352 miss=1199 dirty=1199
>
> 0.7
>   worker 0 delay=175.330000 total io=35828 hit=35788 miss=0 dirty=2
>   worker 1 delay=177.890000 total io=35828 hit=35788 miss=0 dirty=2
>   worker 2 delay=167.540000 total io=35828 hit=35788 miss=0 dirty=2
>   worker 3 delay=216.725000 total io=44322 hit=8352 miss=1199 dirty=1199
>
I have revised the patch4 so that it doesn't depent upon the fix
number of workers, instead I have dynamically updated the worker
count.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: cost based vacuum (parallel)

From
Amit Kapila
Date:
On Tue, Nov 12, 2019 at 3:03 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Nov 12, 2019 at 10:47 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Mon, Nov 11, 2019 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Mon, Nov 11, 2019 at 12:59 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Mon, Nov 11, 2019 at 9:43 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > On Fri, Nov 8, 2019 at 11:49 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > >
> > > > > > Yeah, I think it is difficult to get the exact balance, but we can try
> > > > > > to be as close as possible.  We can try to play with the threshold and
> > > > > > another possibility is to try to sleep in proportion to the amount of
> > > > > > I/O done by the worker.
> > > > > I have done another experiment where I have done another 2 changes on
> > > > > top op patch3
> > > > > a) Only reduce the local balance from the total shared balance
> > > > > whenever it's applying delay
> > > > > b) Compute the delay based on the local balance.
> > > > >
> > > > > patch4:
> > > > > worker 0 delay=84.130000 total I/O=17931 hit=17891 miss=0 dirty=2
> > > > > worker 1 delay=89.230000 total I/O=17931 hit=17891 miss=0 dirty=2
> > > > > worker 2 delay=88.680000 total I/O=17931 hit=17891 miss=0 dirty=2
> > > > > worker 3 delay=80.790000 total I/O=16378 hit=4318 miss=0 dirty=603
> > > > >
> > > > > I think with this approach the delay is divided among the worker quite
> > > > > well compared to other approaches
> > > > >
> > > > > >
> > > ..
> > > > I have tested the same with some other workload(test file attached).
> > > > I can see the same behaviour with this workload as well that with the
> > > > patch 4 the distribution of the delay is better compared to other
> > > > patches i.e. worker with more I/O have more delay and with equal IO
> > > > have alsomost equal delay.  Only thing is that the total delay with
> > > > the patch 4 is slightly less compared to other pacthes.
> > > >
> > >
> > > I see one problem with the formula you have used in the patch, maybe
> > > that is causing the value of total delay to go down.
> > >
> > > - if (new_balance >= VacuumCostLimit)
> > > + VacuumCostBalanceLocal += VacuumCostBalance;
> > > + if ((new_balance >= VacuumCostLimit) &&
> > > + (VacuumCostBalanceLocal > VacuumCostLimit/(0.5 * nworker)))
> > >
> > > As per discussion, the second part of the condition should be
> > > "VacuumCostBalanceLocal > (0.5) * VacuumCostLimit/nworker".  I think
> > > you can once change this and try again.  Also, please try with the
> > > different values of threshold (0.3, 0.5, 0.7, etc.).
> > >
> > I have modified the patch4 and ran with different values.  But, I
> > don't see much difference in the values with the patch4.  Infact I
> > removed the condition for the local balancing check completely still
> > the delays are the same,  I think this is because with patch4 worker
> > are only reducing their own balance and also delaying as much as their
> > local balance.  So maybe the second condition will not have much
> > impact.
> >

Yeah, but I suspect the condition (when the local balance exceeds a
certain threshold, then only try to perform delay) you mentioned can
have an impact in some other scenarios.  So, it is better to retain
the same.  I feel the overall results look sane and the approach seems
reasonable to me.

> >
> I have revised the patch4 so that it doesn't depent upon the fix
> number of workers, instead I have dynamically updated the worker
> count.
>

Thanks.  Sawada-San, by any chance, can you try some of the tests done
by Dilip or some similar tests just to rule out any sort of
machine-specific dependency?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: cost based vacuum (parallel)

From
Masahiko Sawada
Date:
On Tue, 12 Nov 2019 at 19:08, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Nov 12, 2019 at 3:03 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Tue, Nov 12, 2019 at 10:47 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Mon, Nov 11, 2019 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Mon, Nov 11, 2019 at 12:59 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > On Mon, Nov 11, 2019 at 9:43 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > >
> > > > > > On Fri, Nov 8, 2019 at 11:49 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > > >
> > > > > > >
> > > > > > > Yeah, I think it is difficult to get the exact balance, but we can try
> > > > > > > to be as close as possible.  We can try to play with the threshold and
> > > > > > > another possibility is to try to sleep in proportion to the amount of
> > > > > > > I/O done by the worker.
> > > > > > I have done another experiment where I have done another 2 changes on
> > > > > > top op patch3
> > > > > > a) Only reduce the local balance from the total shared balance
> > > > > > whenever it's applying delay
> > > > > > b) Compute the delay based on the local balance.
> > > > > >
> > > > > > patch4:
> > > > > > worker 0 delay=84.130000 total I/O=17931 hit=17891 miss=0 dirty=2
> > > > > > worker 1 delay=89.230000 total I/O=17931 hit=17891 miss=0 dirty=2
> > > > > > worker 2 delay=88.680000 total I/O=17931 hit=17891 miss=0 dirty=2
> > > > > > worker 3 delay=80.790000 total I/O=16378 hit=4318 miss=0 dirty=603
> > > > > >
> > > > > > I think with this approach the delay is divided among the worker quite
> > > > > > well compared to other approaches
> > > > > >
> > > > > > >
> > > > ..
> > > > > I have tested the same with some other workload(test file attached).
> > > > > I can see the same behaviour with this workload as well that with the
> > > > > patch 4 the distribution of the delay is better compared to other
> > > > > patches i.e. worker with more I/O have more delay and with equal IO
> > > > > have alsomost equal delay.  Only thing is that the total delay with
> > > > > the patch 4 is slightly less compared to other pacthes.
> > > > >
> > > >
> > > > I see one problem with the formula you have used in the patch, maybe
> > > > that is causing the value of total delay to go down.
> > > >
> > > > - if (new_balance >= VacuumCostLimit)
> > > > + VacuumCostBalanceLocal += VacuumCostBalance;
> > > > + if ((new_balance >= VacuumCostLimit) &&
> > > > + (VacuumCostBalanceLocal > VacuumCostLimit/(0.5 * nworker)))
> > > >
> > > > As per discussion, the second part of the condition should be
> > > > "VacuumCostBalanceLocal > (0.5) * VacuumCostLimit/nworker".  I think
> > > > you can once change this and try again.  Also, please try with the
> > > > different values of threshold (0.3, 0.5, 0.7, etc.).
> > > >
> > > I have modified the patch4 and ran with different values.  But, I
> > > don't see much difference in the values with the patch4.  Infact I
> > > removed the condition for the local balancing check completely still
> > > the delays are the same,  I think this is because with patch4 worker
> > > are only reducing their own balance and also delaying as much as their
> > > local balance.  So maybe the second condition will not have much
> > > impact.
> > >
>
> Yeah, but I suspect the condition (when the local balance exceeds a
> certain threshold, then only try to perform delay) you mentioned can
> have an impact in some other scenarios.  So, it is better to retain
> the same.  I feel the overall results look sane and the approach seems
> reasonable to me.
>
> > >
> > I have revised the patch4 so that it doesn't depent upon the fix
> > number of workers, instead I have dynamically updated the worker
> > count.
> >
>
> Thanks.  Sawada-San, by any chance, can you try some of the tests done
> by Dilip or some similar tests just to rule out any sort of
> machine-specific dependency?

Sure. I'll try it tomorrow.

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: cost based vacuum (parallel)

From
Masahiko Sawada
Date:
On Tue, 12 Nov 2019 at 20:22, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Tue, 12 Nov 2019 at 19:08, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Nov 12, 2019 at 3:03 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Tue, Nov 12, 2019 at 10:47 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Mon, Nov 11, 2019 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > On Mon, Nov 11, 2019 at 12:59 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > >
> > > > > > On Mon, Nov 11, 2019 at 9:43 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > > >
> > > > > > > On Fri, Nov 8, 2019 at 11:49 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > Yeah, I think it is difficult to get the exact balance, but we can try
> > > > > > > > to be as close as possible.  We can try to play with the threshold and
> > > > > > > > another possibility is to try to sleep in proportion to the amount of
> > > > > > > > I/O done by the worker.
> > > > > > > I have done another experiment where I have done another 2 changes on
> > > > > > > top op patch3
> > > > > > > a) Only reduce the local balance from the total shared balance
> > > > > > > whenever it's applying delay
> > > > > > > b) Compute the delay based on the local balance.
> > > > > > >
> > > > > > > patch4:
> > > > > > > worker 0 delay=84.130000 total I/O=17931 hit=17891 miss=0 dirty=2
> > > > > > > worker 1 delay=89.230000 total I/O=17931 hit=17891 miss=0 dirty=2
> > > > > > > worker 2 delay=88.680000 total I/O=17931 hit=17891 miss=0 dirty=2
> > > > > > > worker 3 delay=80.790000 total I/O=16378 hit=4318 miss=0 dirty=603
> > > > > > >
> > > > > > > I think with this approach the delay is divided among the worker quite
> > > > > > > well compared to other approaches
> > > > > > >
> > > > > > > >
> > > > > ..
> > > > > > I have tested the same with some other workload(test file attached).
> > > > > > I can see the same behaviour with this workload as well that with the
> > > > > > patch 4 the distribution of the delay is better compared to other
> > > > > > patches i.e. worker with more I/O have more delay and with equal IO
> > > > > > have alsomost equal delay.  Only thing is that the total delay with
> > > > > > the patch 4 is slightly less compared to other pacthes.
> > > > > >
> > > > >
> > > > > I see one problem with the formula you have used in the patch, maybe
> > > > > that is causing the value of total delay to go down.
> > > > >
> > > > > - if (new_balance >= VacuumCostLimit)
> > > > > + VacuumCostBalanceLocal += VacuumCostBalance;
> > > > > + if ((new_balance >= VacuumCostLimit) &&
> > > > > + (VacuumCostBalanceLocal > VacuumCostLimit/(0.5 * nworker)))
> > > > >
> > > > > As per discussion, the second part of the condition should be
> > > > > "VacuumCostBalanceLocal > (0.5) * VacuumCostLimit/nworker".  I think
> > > > > you can once change this and try again.  Also, please try with the
> > > > > different values of threshold (0.3, 0.5, 0.7, etc.).
> > > > >
> > > > I have modified the patch4 and ran with different values.  But, I
> > > > don't see much difference in the values with the patch4.  Infact I
> > > > removed the condition for the local balancing check completely still
> > > > the delays are the same,  I think this is because with patch4 worker
> > > > are only reducing their own balance and also delaying as much as their
> > > > local balance.  So maybe the second condition will not have much
> > > > impact.
> > > >
> >
> > Yeah, but I suspect the condition (when the local balance exceeds a
> > certain threshold, then only try to perform delay) you mentioned can
> > have an impact in some other scenarios.  So, it is better to retain
> > the same.  I feel the overall results look sane and the approach seems
> > reasonable to me.
> >
> > > >
> > > I have revised the patch4 so that it doesn't depent upon the fix
> > > number of workers, instead I have dynamically updated the worker
> > > count.
> > >
> >
> > Thanks.  Sawada-San, by any chance, can you try some of the tests done
> > by Dilip or some similar tests just to rule out any sort of
> > machine-specific dependency?
>
> Sure. I'll try it tomorrow.

I've done some tests while changing shared buffer size, delays and
number of workers. The overall results has the similar tendency as the
result shared by Dilip and looks reasonable to me.

* test.sh

shared_buffers = '4GB';
max_parallel_maintenance_workers = 6;
vacuum_cost_delay = 1;
worker 0 delay=89.315000 total io=17931 hit=17891 miss=0 dirty=2
worker 1 delay=88.860000 total io=17931 hit=17891 miss=0 dirty=2
worker 2 delay=89.290000 total io=17931 hit=17891 miss=0 dirty=2
worker 3 delay=81.805000 total io=16378 hit=4318 miss=0 dirty=603

shared_buffers = '1GB';
max_parallel_maintenance_workers = 6;
vacuum_cost_delay = 1;
worker 0 delay=89.210000 total io=17931 hit=17891 miss=0 dirty=2
worker 1 delay=89.325000 total io=17931 hit=17891 miss=0 dirty=2
worker 2 delay=88.870000 total io=17931 hit=17891 miss=0 dirty=2
worker 3 delay=81.735000 total io=16378 hit=4318 miss=0 dirty=603

shared_buffers = '512MB';
max_parallel_maintenance_workers = 6;
vacuum_cost_delay = 1;
worker 0 delay=88.480000 total io=17931 hit=17891 miss=0 dirty=2
worker 1 delay=88.635000 total io=17931 hit=17891 miss=0 dirty=2
worker 2 delay=88.600000 total io=17931 hit=17891 miss=0 dirty=2
worker 3 delay=81.660000 total io=16378 hit=4318 miss=0 dirty=603

shared_buffers = '512MB';
max_parallel_maintenance_workers = 6;
vacuum_cost_delay = 5;
worker 0 delay=447.725000 total io=17931 hit=17891 miss=0 dirty=2
worker 1 delay=445.850000 total io=17931 hit=17891 miss=0 dirty=2
worker 2 delay=445.125000 total io=17931 hit=17891 miss=0 dirty=2
worker 3 delay=409.025000 total io=16378 hit=4318 miss=0 dirty=603

shared_buffers = '512MB';
max_parallel_maintenance_workers = 2;
vacuum_cost_delay = 5;
worker 0 delay=854.750000 total io=34309 hit=22209 miss=0 dirty=605
worker 1 delay=446.500000 total io=17931 hit=17891 miss=0 dirty=2
worker 2 delay=444.175000 total io=17931 hit=17891 miss=0 dirty=2

---
* test1.sh

shared_buffers = '4GB';
max_parallel_maintenance_workers = 6;
vacuum_cost_delay = 1;
worker 0 delay=178.205000 total io=35828 hit=35788 miss=0 dirty=2
worker 1 delay=178.550000 total io=35828 hit=35788 miss=0 dirty=2
worker 2 delay=178.660000 total io=35828 hit=35788 miss=0 dirty=2
worker 3 delay=221.280000 total io=44322 hit=8352 miss=1199 dirty=1199

shared_buffers = '1GB';
max_parallel_maintenance_workers = 6;
vacuum_cost_delay = 1;
worker 0 delay=178.035000 total io=35828 hit=35788 miss=0 dirty=2
worker 1 delay=178.535000 total io=35828 hit=35788 miss=0 dirty=2
worker 2 delay=178.585000 total io=35828 hit=35788 miss=0 dirty=2
worker 3 delay=221.465000 total io=44322 hit=8352 miss=1199 dirty=1199

shared_buffers = '512MB';
max_parallel_maintenance_workers = 6;
vacuum_cost_delay = 1;
worker 0 delay=1795.900000 total io=357911 hit=1 miss=35787 dirty=2
worker 1 delay=1790.700000 total io=357911 hit=1 miss=35787 dirty=2
worker 2 delay=179.000000 total io=35828 hit=35788 miss=0 dirty=2
worker 3 delay=221.355000 total io=44322 hit=8352 miss=1199 dirty=1199

shared_buffers = '512MB';
max_parallel_maintenance_workers = 6;
vacuum_cost_delay = 5;
worker 0 delay=8958.500000 total io=357911 hit=1 miss=35787 dirty=2
worker 1 delay=8950.000000 total io=357911 hit=1 miss=35787 dirty=2
worker 2 delay=894.150000 total  io=35828 hit=35788 miss=0 dirty=2
worker 3 delay=1106.400000 total io=44322 hit=8352 miss=1199 dirty=1199

shared_buffers = '512MB';
max_parallel_maintenance_workers = 2;
vacuum_cost_delay = 5;
worker 0 delay=8956.500000 total io=357911 hit=1 miss=35787 dirty=2
worker 1 delay=8955.050000 total io=357893 hit=3 miss=35785 dirty=2
worker 2 delay=2002.825000 total io=80150 hit=44140 miss=1199 dirty=1201

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: cost based vacuum (parallel)

From
Mahendra Singh
Date:
On Mon, 11 Nov 2019 at 17:56, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Nov 11, 2019 at 5:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Mon, Nov 11, 2019 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > ..
> > > > I have tested the same with some other workload(test file attached).
> > > > I can see the same behaviour with this workload as well that with the
> > > > patch 4 the distribution of the delay is better compared to other
> > > > patches i.e. worker with more I/O have more delay and with equal IO
> > > > have alsomost equal delay.  Only thing is that the total delay with
> > > > the patch 4 is slightly less compared to other pacthes.
> > > >
> > >
> > > I see one problem with the formula you have used in the patch, maybe
> > > that is causing the value of total delay to go down.
> > >
> > > - if (new_balance >= VacuumCostLimit)
> > > + VacuumCostBalanceLocal += VacuumCostBalance;
> > > + if ((new_balance >= VacuumCostLimit) &&
> > > + (VacuumCostBalanceLocal > VacuumCostLimit/(0.5 * nworker)))
> > >
> > > As per discussion, the second part of the condition should be
> > > "VacuumCostBalanceLocal > (0.5) * VacuumCostLimit/nworker".
> > My Bad
> > I think
> > > you can once change this and try again.  Also, please try with the
> > > different values of threshold (0.3, 0.5, 0.7, etc.).
> > >
> > Okay, I will retest with both patch3 and path4 for both the scenarios.
> > I will also try with different multipliers.
> >
>
> One more thing, I think we should also test these cases with a varying
> number of indexes (say 2,6,8,etc.) and then probably, we should test
> by a varying number of workers where the number of workers are lesser
> than indexes.  You can do these after finishing your previous
> experiments.

On the top of parallel vacuum patch, I applied Dilip's patch(0001-vacuum_costing_test.patch). I have tested by varying number of indexes and number of workers. I compared shared costing(0001-vacuum_costing_test.patch) vs shared costing latest patch(shared_costing_plus_patch4_v1.patch).
With shared costing base patch, I can see that delay is not in sync compared to I/O which is resolved by applying patch (shared_costing_plus_patch4_v1.patch).  I have also observed that total delay is slightly reduced with  shared_costing_plus_patch4_v1.patch patch.

Below is the full testing summary:
Test setup:
step1) Apply parallel vacuum patch
step2) Apply 0001-vacuum_costing_test.patch patch (on the top of this patch, delay is not in sync compared to I/O)
step3) Apply shared_costing_plus_patch4_v1.patch (delay is in sync compared to I/O)

Configuration settings:
autovacuum = off
max_parallel_workers = 30
shared_buffers = 2GB
max_parallel_maintenance_workers = 20
vacuum_cost_limit = 2000
vacuum_cost_delay = 10

Test 1: Varry indexes(2,4,6,8) but parallel workers are fixed as 4:

Case 1) When indexes are 2:
Without shared_costing_plus_patch4_v1.patch:
WARNING:  worker 0 delay=120.000000 total io=17931 hit=17891 miss=0 dirty=2
WARNING:  worker 1 delay=60.000000 total io=17931 hit=17891 miss=0 dirty=2

With  shared_costing_plus_patch4_v1.patch:
WARNING:  worker 0 delay=87.780000 total io=17931 hit=17891 miss=0 dirty=2
WARNING:  worker 1 delay=87.995000 total io=17931 hit=17891 miss=0 dirty=2

Case 2) When indexes are 4:
Without shared_costing_plus_patch4_v1.patch:
WARNING:  worker 0 delay=120.000000 total io=17931 hit=17891 miss=0 dirty=2
WARNING:  worker 1 delay=80.000000 total io=17931 hit=17891 miss=0 dirty=2
WARNING:  worker 2 delay=60.000000 total io=17931 hit=17891 miss=0 dirty=2
WARNING:  worker 3 delay=100.000000 total io=17931 hit=17891 miss=0 dirty=2

With  shared_costing_plus_patch4_v1.patch:
WARNING:  worker 0 delay=87.430000 total io=17931 hit=17891 miss=0 dirty=2
WARNING:  worker 1 delay=87.175000 total io=17931 hit=17891 miss=0 dirty=2
WARNING:  worker 2 delay=86.340000 total io=17931 hit=17891 miss=0 dirty=2
WARNING:  worker 3 delay=88.020000 total io=17931 hit=17891 miss=0 dirty=2

Case 3) When indexes are 6:
Without shared_costing_plus_patch4_v1.patch:
WARNING:  worker 0 delay=110.000000 total io=17931 hit=17891 miss=0 dirty=2
WARNING:  worker 1 delay=100.000000 total io=17931 hit=17891 miss=0 dirty=2
WARNING:  worker 2 delay=160.000000 total io=35862 hit=35782 miss=0 dirty=4
WARNING:  worker 3 delay=90.000000 total io=17931 hit=17891 miss=0 dirty=2
WARNING:  worker 4 delay=80.000000 total io=17931 hit=17891 miss=0 dirty=2

With  shared_costing_plus_patch4_v1.patch:
WARNING:  worker 0 delay=173.195000 total io=35862 hit=35782 miss=0 dirty=4
WARNING:  worker 1 delay=88.715000 total io=17931 hit=17891 miss=0 dirty=2
WARNING:  worker 2 delay=87.710000 total io=17931 hit=17891 miss=0 dirty=2
WARNING:  worker 3 delay=86.460000 total io=17931 hit=17891 miss=0 dirty=2
WARNING:  worker 4 delay=89.435000 total io=17931 hit=17891 miss=0 dirty=2

Case 4) When indexes are 8:
Without shared_costing_plus_patch4_v1.patch:
WARNING:  worker 0 delay=170.000000 total io=35862 hit=35782 miss=0 dirty=4
WARNING:  worker 1 delay=120.000000 total io=17931 hit=17891 miss=0 dirty=2
WARNING:  worker 2 delay=130.000000 total io=17931 hit=17891 miss=0 dirty=2
WARNING:  worker 3 delay=190.000000 total io=35862 hit=35782 miss=0 dirty=4
WARNING:  worker 4 delay=110.000000 total io=35862 hit=35782 miss=0 dirty=4

With  shared_costing_plus_patch4_v1.patch:
WARNING:  worker 0 delay=174.700000 total io=35862 hit=35782 miss=0 dirty=4
WARNING:  worker 1 delay=177.880000 total io=35862 hit=35782 miss=0 dirty=4
WARNING:  worker 2 delay=89.460000 total io=17931 hit=17891 miss=0 dirty=2
WARNING:  worker 3 delay=177.320000 total io=35862 hit=35782 miss=0 dirty=4
WARNING:  worker 4 delay=86.810000 total io=17931 hit=17891 miss=0 dirty=2

Test 2: Indexes are 16 but parallel workers are 2,4,8:

Case 1) When 2 parallel workers:
Without shared_costing_plus_patch4_v1.patch:
WARNING:  worker 0 delay=1513.230000 total io=307197 hit=85167 miss=22179 dirty=12
WARNING:  worker 1 delay=1543.385000 total io=326553 hit=63133 miss=26322 dirty=10
WARNING:  worker 2 delay=1633.625000 total io=302199 hit=65839 miss=23616 dirty=10

With  shared_costing_plus_patch4_v1.patch:
WARNING:  worker 0 delay=1539.475000 total io=308175 hit=65175 miss=24280 dirty=10
WARNING:  worker 1 delay=1251.200000 total io=250692 hit=71562 miss=17893 dirty=10
WARNING:  worker 2 delay=1143.690000 total io=228987 hit=93857 miss=13489 dirty=12

Case 2) When 4 parallel workers:
Without shared_costing_plus_patch4_v1.patch:
WARNING:  worker 0 delay=1182.430000 total io=213567 hit=16037 miss=19745 dirty=4
WARNING:  worker 1 delay=1202.710000 total io=178941 hit=1 miss=17890 dirty=2
WARNING:  worker 2 delay=210.000000 total io=89655 hit=89455 miss=0 dirty=10
WARNING:  worker 3 delay=270.000000 total io=71724 hit=71564 miss=0 dirty=8
WARNING:  worker 4 delay=851.825000 total io=188229 hit=58619 miss=12945 dirty=8

With  shared_costing_plus_patch4_v1.patch:
WARNING:  worker 0 delay=1136.875000 total io=227679 hit=14469 miss=21313 dirty=4
WARNING:  worker 1 delay=973.745000 total io=196881 hit=17891 miss=17891 dirty=4
WARNING:  worker 2 delay=447.410000 total io=89655 hit=89455 miss=0 dirty=10
WARNING:  worker 3 delay=833.235000 total io=168228 hit=40958 miss=12715 dirty=6
WARNING:  worker 4 delay=683.200000 total io=136488 hit=64368 miss=7196 dirty=8

Case 3) When 8 parallel workers:
Without shared_costing_plus_patch4_v1.patch:
WARNING:  worker 0 delay=1022.300000 total io=178941 hit=1 miss=17890 dirty=2
WARNING:  worker 1 delay=1072.770000 total io=178941 hit=1 miss=17890 dirty=2
WARNING:  worker 2 delay=170.000000 total io=35862 hit=35782 miss=0 dirty=4
WARNING:  worker 3 delay=170.000000 total io=35862 hit=35782 miss=0 dirty=4
WARNING:  worker 4 delay=140.035000 total io=35862 hit=35782 miss=0 dirty=4
WARNING:  worker 5 delay=200.000000 total io=53802 hit=53672 miss=1 dirty=6
WARNING:  worker 6 delay=130.000000 total io=35862 hit=35782 miss=0 dirty=4
WARNING:  worker 7 delay=150.000000 total io=53793 hit=53673 miss=0 dirty=6

With  shared_costing_plus_patch4_v1.patch:
WARNING:  worker 0 delay=872.800000 total io=178941 hit=1 miss=17890 dirty=2
WARNING:  worker 1 delay=885.950000 total io=178941 hit=1 miss=17890 dirty=2
WARNING:  worker 2 delay=175.680000 total io=35862 hit=35782 miss=0 dirty=4
WARNING:  worker 3 delay=259.560000 total io=53793 hit=53673 miss=0 dirty=6
WARNING:  worker 4 delay=169.945000 total io=35862 hit=35782 miss=0 dirty=4
WARNING:  worker 5 delay=613.845000 total io=125100 hit=45750 miss=7923 dirty=6
WARNING:  worker 6 delay=171.895000 total io=35862 hit=35782 miss=0 dirty=4
WARNING:  worker 7 delay=176.505000 total io=35862 hit=35782 miss=0 dirty=4


Thanks and Regards
Mahendra Thalor

Re: cost based vacuum (parallel)

From
Amit Kapila
Date:
On Wed, Nov 13, 2019 at 10:02 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> I've done some tests while changing shared buffer size, delays and
> number of workers. The overall results has the similar tendency as the
> result shared by Dilip and looks reasonable to me.
>

Thanks, Sawada-san for repeating the tests.  I can see from yours,
Dilip and Mahendra's testing that the delay is distributed depending
upon the I/O done by a particular worker and the total I/O is also as
expected in various kinds of scenarios.  So, I think this is a better
approach.  Do you agree or you think we should still investigate more
on another approach as well?

I would like to summarize this approach.  The basic idea for parallel
vacuum is to allow the parallel workers and master backend to have a
shared view of vacuum cost related parameters (mainly
VacuumCostBalance) and allow each worker to update it and then based
on that decide whether it needs to sleep.  With this basic idea, we
found that in some cases the throttling is not accurate as explained
with an example in my email above [1] and then the tests performed by
Dilip and others in the following emails (In short, the workers doing
more I/O can be throttled less).  Then as discussed in an email later
[2], we tried a way to avoid letting the workers sleep which has done
less or no I/O as compared to other workers.  This ensured that
workers who are doing more I/O got throttled more.  The idea is to
allow any worker to sleep only if it has performed the I/O above a
certain threshold and the overall balance is more than the cost_limit
set by the system.  Then we will allow the worker to sleep
proportional to the work done by it and reduce the
VacuumSharedCostBalance by the amount which is consumed by the current
worker.  This scheme leads to the desired throttling by different
workers based on the work done by the individual worker.

We have tested this idea with various kinds of workloads like by
varying shared buffer size, delays and number of workers.  Then also,
we have tried with a different number of indexes and workers.  In all
the tests, we found that the workers are throttled proportional to the
I/O being done by a particular worker.


[1] - https://www.postgresql.org/message-id/CAA4eK1JvxBTWTPqHGx1X7in7j42ZYwuKOZUySzH3YMwTNRE-2Q%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CAA4eK1K9kCqLKbVA9KUuuarjj%2BsNYqrmf6UAFok5VTgZ8evWoA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: cost based vacuum (parallel)

From
Dilip Kumar
Date:
On Thu, Nov 14, 2019 at 5:02 PM Mahendra Singh <mahi6run@gmail.com> wrote:
>
> On Mon, 11 Nov 2019 at 17:56, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Nov 11, 2019 at 5:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Mon, Nov 11, 2019 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > ..
> > > > > I have tested the same with some other workload(test file attached).
> > > > > I can see the same behaviour with this workload as well that with the
> > > > > patch 4 the distribution of the delay is better compared to other
> > > > > patches i.e. worker with more I/O have more delay and with equal IO
> > > > > have alsomost equal delay.  Only thing is that the total delay with
> > > > > the patch 4 is slightly less compared to other pacthes.
> > > > >
> > > >
> > > > I see one problem with the formula you have used in the patch, maybe
> > > > that is causing the value of total delay to go down.
> > > >
> > > > - if (new_balance >= VacuumCostLimit)
> > > > + VacuumCostBalanceLocal += VacuumCostBalance;
> > > > + if ((new_balance >= VacuumCostLimit) &&
> > > > + (VacuumCostBalanceLocal > VacuumCostLimit/(0.5 * nworker)))
> > > >
> > > > As per discussion, the second part of the condition should be
> > > > "VacuumCostBalanceLocal > (0.5) * VacuumCostLimit/nworker".
> > > My Bad
> > > I think
> > > > you can once change this and try again.  Also, please try with the
> > > > different values of threshold (0.3, 0.5, 0.7, etc.).
> > > >
> > > Okay, I will retest with both patch3 and path4 for both the scenarios.
> > > I will also try with different multipliers.
> > >
> >
> > One more thing, I think we should also test these cases with a varying
> > number of indexes (say 2,6,8,etc.) and then probably, we should test
> > by a varying number of workers where the number of workers are lesser
> > than indexes.  You can do these after finishing your previous
> > experiments.
>
> On the top of parallel vacuum patch, I applied Dilip's patch(0001-vacuum_costing_test.patch). I have tested by
varyingnumber of indexes and number of workers. I compared shared costing(0001-vacuum_costing_test.patch) vs shared
costinglatest patch(shared_costing_plus_patch4_v1.patch). 
> With shared costing base patch, I can see that delay is not in sync compared to I/O which is resolved by applying
patch(shared_costing_plus_patch4_v1.patch).  I have also observed that total delay is slightly reduced with
shared_costing_plus_patch4_v1.patchpatch. 
>
> Below is the full testing summary:
> Test setup:
> step1) Apply parallel vacuum patch
> step2) Apply 0001-vacuum_costing_test.patch patch (on the top of this patch, delay is not in sync compared to I/O)
> step3) Apply shared_costing_plus_patch4_v1.patch (delay is in sync compared to I/O)
>
> Configuration settings:
> autovacuum = off
> max_parallel_workers = 30
> shared_buffers = 2GB
> max_parallel_maintenance_workers = 20
> vacuum_cost_limit = 2000
> vacuum_cost_delay = 10
>
> Test 1: Varry indexes(2,4,6,8) but parallel workers are fixed as 4:
>
> Case 1) When indexes are 2:
> Without shared_costing_plus_patch4_v1.patch:
> WARNING:  worker 0 delay=120.000000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING:  worker 1 delay=60.000000 total io=17931 hit=17891 miss=0 dirty=2
>
> With  shared_costing_plus_patch4_v1.patch:
> WARNING:  worker 0 delay=87.780000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING:  worker 1 delay=87.995000 total io=17931 hit=17891 miss=0 dirty=2
>
> Case 2) When indexes are 4:
> Without shared_costing_plus_patch4_v1.patch:
> WARNING:  worker 0 delay=120.000000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING:  worker 1 delay=80.000000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING:  worker 2 delay=60.000000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING:  worker 3 delay=100.000000 total io=17931 hit=17891 miss=0 dirty=2
>
> With  shared_costing_plus_patch4_v1.patch:
> WARNING:  worker 0 delay=87.430000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING:  worker 1 delay=87.175000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING:  worker 2 delay=86.340000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING:  worker 3 delay=88.020000 total io=17931 hit=17891 miss=0 dirty=2
>
> Case 3) When indexes are 6:
> Without shared_costing_plus_patch4_v1.patch:
> WARNING:  worker 0 delay=110.000000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING:  worker 1 delay=100.000000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING:  worker 2 delay=160.000000 total io=35862 hit=35782 miss=0 dirty=4
> WARNING:  worker 3 delay=90.000000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING:  worker 4 delay=80.000000 total io=17931 hit=17891 miss=0 dirty=2
>
> With  shared_costing_plus_patch4_v1.patch:
> WARNING:  worker 0 delay=173.195000 total io=35862 hit=35782 miss=0 dirty=4
> WARNING:  worker 1 delay=88.715000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING:  worker 2 delay=87.710000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING:  worker 3 delay=86.460000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING:  worker 4 delay=89.435000 total io=17931 hit=17891 miss=0 dirty=2
>
> Case 4) When indexes are 8:
> Without shared_costing_plus_patch4_v1.patch:
> WARNING:  worker 0 delay=170.000000 total io=35862 hit=35782 miss=0 dirty=4
> WARNING:  worker 1 delay=120.000000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING:  worker 2 delay=130.000000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING:  worker 3 delay=190.000000 total io=35862 hit=35782 miss=0 dirty=4
> WARNING:  worker 4 delay=110.000000 total io=35862 hit=35782 miss=0 dirty=4
>
> With  shared_costing_plus_patch4_v1.patch:
> WARNING:  worker 0 delay=174.700000 total io=35862 hit=35782 miss=0 dirty=4
> WARNING:  worker 1 delay=177.880000 total io=35862 hit=35782 miss=0 dirty=4
> WARNING:  worker 2 delay=89.460000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING:  worker 3 delay=177.320000 total io=35862 hit=35782 miss=0 dirty=4
> WARNING:  worker 4 delay=86.810000 total io=17931 hit=17891 miss=0 dirty=2
>
> Test 2: Indexes are 16 but parallel workers are 2,4,8:
>
> Case 1) When 2 parallel workers:
> Without shared_costing_plus_patch4_v1.patch:
> WARNING:  worker 0 delay=1513.230000 total io=307197 hit=85167 miss=22179 dirty=12
> WARNING:  worker 1 delay=1543.385000 total io=326553 hit=63133 miss=26322 dirty=10
> WARNING:  worker 2 delay=1633.625000 total io=302199 hit=65839 miss=23616 dirty=10
>
> With  shared_costing_plus_patch4_v1.patch:
> WARNING:  worker 0 delay=1539.475000 total io=308175 hit=65175 miss=24280 dirty=10
> WARNING:  worker 1 delay=1251.200000 total io=250692 hit=71562 miss=17893 dirty=10
> WARNING:  worker 2 delay=1143.690000 total io=228987 hit=93857 miss=13489 dirty=12
>
> Case 2) When 4 parallel workers:
> Without shared_costing_plus_patch4_v1.patch:
> WARNING:  worker 0 delay=1182.430000 total io=213567 hit=16037 miss=19745 dirty=4
> WARNING:  worker 1 delay=1202.710000 total io=178941 hit=1 miss=17890 dirty=2
> WARNING:  worker 2 delay=210.000000 total io=89655 hit=89455 miss=0 dirty=10
> WARNING:  worker 3 delay=270.000000 total io=71724 hit=71564 miss=0 dirty=8
> WARNING:  worker 4 delay=851.825000 total io=188229 hit=58619 miss=12945 dirty=8
>
> With  shared_costing_plus_patch4_v1.patch:
> WARNING:  worker 0 delay=1136.875000 total io=227679 hit=14469 miss=21313 dirty=4
> WARNING:  worker 1 delay=973.745000 total io=196881 hit=17891 miss=17891 dirty=4
> WARNING:  worker 2 delay=447.410000 total io=89655 hit=89455 miss=0 dirty=10
> WARNING:  worker 3 delay=833.235000 total io=168228 hit=40958 miss=12715 dirty=6
> WARNING:  worker 4 delay=683.200000 total io=136488 hit=64368 miss=7196 dirty=8
>
> Case 3) When 8 parallel workers:
> Without shared_costing_plus_patch4_v1.patch:
> WARNING:  worker 0 delay=1022.300000 total io=178941 hit=1 miss=17890 dirty=2
> WARNING:  worker 1 delay=1072.770000 total io=178941 hit=1 miss=17890 dirty=2
> WARNING:  worker 2 delay=170.000000 total io=35862 hit=35782 miss=0 dirty=4
> WARNING:  worker 3 delay=170.000000 total io=35862 hit=35782 miss=0 dirty=4
> WARNING:  worker 4 delay=140.035000 total io=35862 hit=35782 miss=0 dirty=4
> WARNING:  worker 5 delay=200.000000 total io=53802 hit=53672 miss=1 dirty=6
> WARNING:  worker 6 delay=130.000000 total io=35862 hit=35782 miss=0 dirty=4
> WARNING:  worker 7 delay=150.000000 total io=53793 hit=53673 miss=0 dirty=6
>
> With  shared_costing_plus_patch4_v1.patch:
> WARNING:  worker 0 delay=872.800000 total io=178941 hit=1 miss=17890 dirty=2
> WARNING:  worker 1 delay=885.950000 total io=178941 hit=1 miss=17890 dirty=2
> WARNING:  worker 2 delay=175.680000 total io=35862 hit=35782 miss=0 dirty=4
> WARNING:  worker 3 delay=259.560000 total io=53793 hit=53673 miss=0 dirty=6
> WARNING:  worker 4 delay=169.945000 total io=35862 hit=35782 miss=0 dirty=4
> WARNING:  worker 5 delay=613.845000 total io=125100 hit=45750 miss=7923 dirty=6
> WARNING:  worker 6 delay=171.895000 total io=35862 hit=35782 miss=0 dirty=4
> WARNING:  worker 7 delay=176.505000 total io=35862 hit=35782 miss=0 dirty=4

It seems that the bigger delay difference (8% - 9 %), which is
observed with the higher number of indexes is due to the IO
difference, for example in case3, the total page miss without patch is
35780 whereas with the patch it is 43703.  So it seems that with more
indexes your data is not fitting in the shared buffer so the page
hits/misses are varying run to run and that will cause variance in the
total delay.  Another problem where delay with the patch is 2-3%
lesser, is basically the problem of the "0001-vacuum_costing_test"
patch because that patch is only displaying the delay during the index
vacuuming phase, not the total delay.  So if we observe the total
delay then it should be the same.  The modified version of
0001-vacuum_costing_test is attached to print the total delay.

In my test.sh, I can see the total delay is almost the same.

Non-parallel vacuum
WARNING:  VacuumCostTotalDelay = 11332.170000

Paralle vacuum with shared_costing_plus_patch4_v1.patch:
WARNING:  worker 0 delay=89.230000 total io=17931 hit=17891 miss=0 dirty=2
WARNING:  worker 1 delay=85.205000 total io=17931 hit=17891 miss=0 dirty=2
WARNING:  worker 2 delay=87.290000 total io=17931 hit=17891 miss=0 dirty=2
WARNING:  worker 3 delay=78.365000 total io=16378 hit=4318 miss=0 dirty=603

WARNING:  VacuumCostTotalDelay = 11331.690000

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: cost based vacuum (parallel)

From
Masahiko Sawada
Date:
On Fri, 15 Nov 2019 at 11:54, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Nov 13, 2019 at 10:02 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > I've done some tests while changing shared buffer size, delays and
> > number of workers. The overall results has the similar tendency as the
> > result shared by Dilip and looks reasonable to me.
> >
>
> Thanks, Sawada-san for repeating the tests.  I can see from yours,
> Dilip and Mahendra's testing that the delay is distributed depending
> upon the I/O done by a particular worker and the total I/O is also as
> expected in various kinds of scenarios.  So, I think this is a better
> approach.  Do you agree or you think we should still investigate more
> on another approach as well?
>
> I would like to summarize this approach.  The basic idea for parallel
> vacuum is to allow the parallel workers and master backend to have a
> shared view of vacuum cost related parameters (mainly
> VacuumCostBalance) and allow each worker to update it and then based
> on that decide whether it needs to sleep.  With this basic idea, we
> found that in some cases the throttling is not accurate as explained
> with an example in my email above [1] and then the tests performed by
> Dilip and others in the following emails (In short, the workers doing
> more I/O can be throttled less).  Then as discussed in an email later
> [2], we tried a way to avoid letting the workers sleep which has done
> less or no I/O as compared to other workers.  This ensured that
> workers who are doing more I/O got throttled more.  The idea is to
> allow any worker to sleep only if it has performed the I/O above a
> certain threshold and the overall balance is more than the cost_limit
> set by the system.  Then we will allow the worker to sleep
> proportional to the work done by it and reduce the
> VacuumSharedCostBalance by the amount which is consumed by the current
> worker.  This scheme leads to the desired throttling by different
> workers based on the work done by the individual worker.
>
> We have tested this idea with various kinds of workloads like by
> varying shared buffer size, delays and number of workers.  Then also,
> we have tried with a different number of indexes and workers.  In all
> the tests, we found that the workers are throttled proportional to the
> I/O being done by a particular worker.

Thank you for summarizing!

I agreed to this approach.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services