Re: autovac issue with large number of tables - Mailing list pgsql-hackers
From | Kasahara Tatsuhito |
---|---|
Subject | Re: autovac issue with large number of tables |
Date | |
Msg-id | CAP0=ZVLK0_UQfagn1KBS+_ADfa4e+q7MNrzbRZUqbc6jEZoy4g@mail.gmail.com Whole thread Raw |
In response to | Re: autovac issue with large number of tables (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: autovac issue with large number of tables
|
List | pgsql-hackers |
On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito > <kasahara.tatsuhito@gmail.com> wrote: > > > > Hi, > > > > On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito > > > <kasahara.tatsuhito@gmail.com> wrote: > > > > > > > > Hi, > > > > > > > > On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito > > > > <kasahara.tatsuhito@gmail.com> wrote: > > > > > > I wonder if we could have table_recheck_autovac do two probes of the stats > > > > > > data. First probe the existing stats data, and if it shows the table to > > > > > > be already vacuumed, return immediately. If not, *then* force a stats > > > > > > re-read, and check a second time. > > > > > Does the above mean that the second and subsequent table_recheck_autovac() > > > > > will be improved to first check using the previous refreshed statistics? > > > > > I think that certainly works. > > > > > > > > > > If that's correct, I'll try to create a patch for the PoC > > > > > > > > I still don't know how to reproduce Jim's troubles, but I was able to reproduce > > > > what was probably a very similar problem. > > > > > > > > This problem seems to be more likely to occur in cases where you have > > > > a large number of tables, > > > > i.e., a large amount of stats, and many small tables need VACUUM at > > > > the same time. > > > > > > > > So I followed Tom's advice and created a patch for the PoC. > > > > This patch will enable a flag in the table_recheck_autovac function to use > > > > the existing stats next time if VACUUM (or ANALYZE) has already been done > > > > by another worker on the check after the stats have been updated. > > > > If the tables continue to require VACUUM after the refresh, then a refresh > > > > will be required instead of using the existing statistics. > > > > > > > > I did simple test with HEAD and HEAD + this PoC patch. > > > > The tests were conducted in two cases. > > > > (I changed few configurations. see attached scripts) > > > > > > > > 1. Normal VACUUM case > > > > - SET autovacuum = off > > > > - CREATE tables with 100 rows > > > > - DELETE 90 rows for each tables > > > > - SET autovacuum = on and restart PostgreSQL > > > > - Measure the time it takes for all tables to be VACUUMed > > > > > > > > 2. Anti wrap round VACUUM case > > > > - CREATE brank tables > > > > - SELECT all of these tables (for generate stats) > > > > - SET autovacuum_freeze_max_age to low values and restart PostgreSQL > > > > - Consumes a lot of XIDs by using txid_curent() > > > > - Measure the time it takes for all tables to be VACUUMed > > > > > > > > For each test case, the following results were obtained by changing > > > > autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. > > > > Also changing num of tables to 1000, 5000, 10000 and 20000. > > > > > > > > Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable, > > > > but I think it's enough to ask for a trend. > > > > > > > > =========================================================================== > > > > [1.Normal VACUUM case] > > > > tables:1000 > > > > autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec > > > > autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec > > > > autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec > > > > autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec > > > > autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec > > > > > > > > tables:5000 > > > > autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec > > > > autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec > > > > autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec > > > > autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec > > > > autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec > > > > > > > > tables:10000 > > > > autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec > > > > autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec > > > > autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec > > > > autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec > > > > autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec > > > > > > > > tables:20000 > > > > autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec > > > > autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec > > > > autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec > > > > autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec > > > > autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec > > > > > > > > [2.Anti wrap round VACUUM case] > > > > tables:1000 > > > > autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec > > > > autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec > > > > autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec > > > > autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec > > > > autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec > > > > > > > > tables:5000 > > > > autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec > > > > autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec > > > > autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec > > > > autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec > > > > autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec > > > > > > > > tables:10000 > > > > autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec > > > > autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec > > > > autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec > > > > autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec > > > > autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec > > > > > > > > tables:20000 > > > > autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec > > > > autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec > > > > autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec > > > > autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec > > > > autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec > > > > =========================================================================== > > > > > > > > The cases without patch, the scalability of the worker has decreased > > > > as the number of tables has increased. > > > > In fact, the more workers there are, the longer it takes to complete > > > > VACUUM to all tables. > > > > The cases with patch, it shows good scalability with respect to the > > > > number of workers. > > > > > > It seems a good performance improvement even without the patch of > > > shared memory based stats collector. > > > > > > > > > > > Note that perf top results showed that hash_search_with_hash_value, > > > > hash_seq_search and > > > > pgstat_read_statsfiles are dominant during VACUUM in all patterns, > > > > with or without the patch. > > > > > > > > Therefore, there is still a need to find ways to optimize the reading > > > > of large amounts of stats. > > > > However, this patch is effective in its own right, and since there are > > > > only a few parts to modify, > > > > I think it should be able to be applied to current (preferably > > > > pre-v13) PostgreSQL. > > > > > > +1 > > > > > > + > > > + /* We might be better to refresh stats */ > > > + use_existing_stats = false; > > > } > > > + else > > > + { > > > > > > - heap_freetuple(classTup); > > > + heap_freetuple(classTup); > > > + /* The relid has already vacuumed, so we might be better to > > > use exiting stats */ > > > + use_existing_stats = true; > > > + } > > > > > > With that patch, the autovacuum process refreshes the stats in the > > > next check if it finds out that the table still needs to be vacuumed. > > > But I guess it's not necessarily true because the next table might be > > > vacuumed already. So I think we might want to always use the existing > > > for the first check. What do you think? > > Thanks for your comment. > > > > If we assume the case where some workers vacuum on large tables > > and a single worker vacuum on small tables, the processing > > performance of the single worker will be slightly lower if the > > existing statistics are checked every time. > > > > In fact, at first I tried to check the existing stats every time, > > but the performance was slightly worse in cases with a small number of workers. > > (Checking the existing stats is lightweight , but at high frequency, > > it affects processing performance.) > > Therefore, at after refresh statistics, determine whether autovac > > should use the existing statistics. > > Yeah, since the test you used uses a lot of small tables, if there are > a few workers, checking the existing stats is unlikely to return true > (no need to vacuum). So the cost of existing stats check ends up being > overhead. Not sure how slow always checking the existing stats was, > but given that the shared memory based stats collector patch could > improve the performance of refreshing stats, it might be better not to > check the existing stats frequently like the patch does. Anyway, I > think it’s better to evaluate the performance improvement with other > cases too. Yeah, I would like to see how much the performance changes in other cases. In addition, if the shared-based-stats patch is applied, we won't need to reload a huge stats file, so we will just have to check the stats on shared-mem every time. Perhaps the logic of table_recheck_autovac could be simpler. > > BTW, I found some typos in comments, so attache a fixed version. > > Thank you for updating the patch! I'll also run the performance test > you shared with the latest version patch. Thank you! It's very helpful. Best regards, -- Tatsuhito Kasahara kasahara.tatsuhito _at_ gmail.com
pgsql-hackers by date: