Re: [HACKERS] Block level parallel vacuum - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: [HACKERS] Block level parallel vacuum |
Date | |
Msg-id | CAFiTN-sp3x21VUTQHy-pPieg+Wa_UEKLYtxtvfE_y9i2j3c5Bw@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Block level parallel vacuum (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: [HACKERS] Block level parallel vacuum
|
List | pgsql-hackers |
On Wed, Dec 4, 2019 at 9:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Dec 4, 2019 at 1:58 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Tue, 3 Dec 2019 at 11:55, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > In your code, I think if two workers enter to compute_parallel_delay > > function at the same time, they add their local balance to > > VacuumSharedCostBalance and both workers sleep because both values > > reach the VacuumCostLimit. > > > > True, but isn't it more appropriate because the local cost of any > worker should be ideally added to shared cost as soon as it occurred? > I mean to say that we are not adding any cost in shared balance > without actually incurring it. Then we also consider the individual > worker's local balance as well and sleep according to local balance. Even I think it is better to add the balance to the shared balance at the earliest opportunity. Just consider the case that there are 5 workers and all have I/O balance of 20, and VacuumCostLimit is 50. So Actually, there combined balance is 100 (which is double of the VacuumCostLimit) but if we don't add immediately then none of the workers will sleep and it may go to the next cycle which is not very good. OTOH, if we add 20 immediately then check the shared balance then all the workers might go for sleep if their local balances have reached the limit but they will only sleep in proportion to their local balance. So IMHO, adding the current balance to shared balance early is more close to the model we are trying to implement i.e. shared cost accounting. > > > > > > 2. I think if we cam somehow disallow very small indexes to use parallel workers, then it will be better. Can weuse min_parallel_index_scan_size to decide whether a particular index can participate in a parallel vacuum? > > > > I think it's a good idea but I'm concerned that the default value of > > min_parallel_index_scan_size, 512kB, is too small for parallel vacuum > > purpose. Given that people who want to use parallel vacuum are likely > > to have a big table the indexes that can be skipped by the default > > value would be only brin indexes, I think. > > > > Yeah or probably hash indexes in some cases. > > > Also I guess that the > > reason why the default value is small is that > > min_parallel_index_scan_size compares to the number of blocks being > > scanned during index scan, not whole index. On the other hand in > > parallel vacuum we will compare it to the whole index blocks because > > the index vacuuming is always full scan. So I'm also concerned that > > user will get confused about reasonable setting. > > > > This setting is about how much of index we are going to scan, so I am > not sure if it matters whether it is part or full index scan. Also, > in an index scan, we will launch multiple workers to scan that index > and here we will consider launching just one worker. > > > As another idea how about using min_parallel_table_scan_size instead? > > > > Hmm, yeah, that can be another option, but it might not be a good idea > for partial indexes. > > > That is, we cannot do parallel vacuum on the table smaller than that > > value. > > > > Yeah, that makes sense, but I feel if we can directly target index > scan size that may be a better option. If we can't use > min_parallel_index_scan_size, then we can consider this. > > -- > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: