Re: cost based vacuum (parallel) - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: cost based vacuum (parallel) |
Date | |
Msg-id | CAFiTN-snLc5qioVcY1MK8mB11LZqTftKctDx55x0RF2HvfnfSQ@mail.gmail.com Whole thread Raw |
In response to | Re: cost based vacuum (parallel) (Stephen Frost <sfrost@snowman.net>) |
List | pgsql-hackers |
On Wed, Nov 6, 2019 at 9:21 AM Stephen Frost <sfrost@snowman.net> wrote: > > Greetings, > > * Amit Kapila (amit.kapila16@gmail.com) wrote: > > On Tue, Nov 5, 2019 at 1:42 AM Stephen Frost <sfrost@snowman.net> wrote: > > > * Andres Freund (andres@anarazel.de) wrote: > > > > That's quite doable independent of parallelism, as we don't have tables > > > > or indexes spanning more than one tablespace. True, you could then make > > > > the processing of an individual vacuum faster by allowing to utilize > > > > multiple tablespace budgets at the same time. > > > > > > Yes, it's possible to do independent of parallelism, but what I was > > > trying to get at above is that it might not be worth the effort. When > > > it comes to parallel vacuum though, I'm not sure that you can just punt > > > on this question since you'll naturally end up spanning multiple > > > tablespaces concurrently, at least if the heap+indexes are spread across > > > multiple tablespaces and you're operating against more than one of those > > > relations at a time > > > > Each parallel worker operates on a separate index. It might be worth > > exploring per-tablespace vacuum throttling, but that should not be a > > requirement for the currently proposed patch. > > Right, that each operates on a separate index in parallel is what I had > figured was probably happening, and that's why I brought up the question > of "well, what does IO throttling mean when you've got multiple > tablespaces involved with presumably independent IO channels...?" (or, > at least, that's what I was trying to go for). > > This isn't a question with the current system and way the code works > within a single vacuum operation, as we're never operating on more than > one relation concurrently in that case. > > Of course, we don't currently do anything to manage IO utilization > across tablespaces when there are multiple autovacuum workers running > concurrently, which I suppose goes to Andres' point that we aren't > really doing anything to deal with this today and therefore this is > perhaps not all that new of an issue just with the addition of > parallel vacuum. I'd still argue that it becomes a lot more apparent > when you're talking about one parallel vacuum, but ultimately we should > probably be thinking about how to manage the resources across all the > vacuums and tablespaces and queries and such. > > In an ideal world, we'd track the i/o from front-end queries, have some > idea of the total i/o possible for each IO channel, and allow vacuum and > whatever other background processes need to run to scale up and down, > with enough buffer to avoid ever being maxed out on i/o, but keeping up > a consistent rate of i/o that lets everything finish as quickly as > possible. IMHO, in future suppose we improve the I/O throttling for each tablespace, maybe by maintaining the independent balance for relation and each index of the relation or may be combined balance for the indexes which are on the same tablespace. And, the balance can be checked against its tablespace i/o limit. So If we get such a mechanism in the future then it seems that it will be easily expandable to the parallel vacuum, isn't it? Because across workers also we can track tablespace wise shared balance (if we go with the shared costing approach for example). -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: