Re: cost based vacuum (parallel) - Mailing list pgsql-hackers

From Darafei "Komяpa" Praliaskouski
Subject Re: cost based vacuum (parallel)
Date
Msg-id CAC8Q8tJXWS1BaZWtwkG8XFjab79oyOxXxNarZrAWSmmobKeE9w@mail.gmail.com
Whole thread Raw
In response to cost based vacuum (parallel)  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: cost based vacuum (parallel)  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers

This is somewhat similar to a memory usage problem with a
parallel query where each worker is allowed to use up to work_mem of
memory.  We can say that the users using parallel operation can expect
more system resources to be used as they want to get the operation
done faster, so we are fine with this.  However, I am not sure if that
is the right thing, so we should try to come up with some solution for
it and if the solution is too complex, then probably we can think of
documenting such behavior.

In cloud environments (Amazon + gp2) there's a budget on input/output operations. If you cross it for long time, everything starts looking like you work with a floppy disk.

For the ease of configuration, I would need a "max_vacuum_disk_iops" that would limit number of input-output operations by all of the vacuums in the system. If I set it to less than value of budget refill, I can be sure than that no vacuum runs too fast to impact any sibling query. 

There's also value in non-throttled VACUUM for smaller tables. On gp2 such things will be consumed out of surge budget, and its size is known to sysadmin. Let's call it "max_vacuum_disk_surge_iops" - if a relation has less blocks than this value and it's a blocking in any way situation (antiwraparound, interactive console, ...) - go on and run without throttling.

For how to balance the cost: if we know a number of vacuum processes that were running in the previous second, we can just divide a slot for this iteration by that previous number. 

To correct for overshots, we can subtract the previous second's overshot from next one's. That would also allow to account for surge budget usage and let it refill, pausing all autovacuum after a manual one for some time.

Precision of accounting limiting count of operations more than once a second isn't beneficial for this use case. 

Please don't forget that processing one page can become several iops (read, write, wal).

Does this make sense? :)

 

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: [HACKERS] Block level parallel vacuum
Next
From: Peter Eisentraut
Date:
Subject: Re: Refactor parse analysis of EXECUTE command