Re: autovac issue with large number of tables - Mailing list pgsql-hackers
From | Kasahara Tatsuhito |
---|---|
Subject | Re: autovac issue with large number of tables |
Date | |
Msg-id | CAP0=ZVK=bP29UpkHGYP2z-TcctTEt2=2gjJOoaRxNO6u4OFShg@mail.gmail.com Whole thread Raw |
In response to | Re: autovac issue with large number of tables (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Responses |
Re: autovac issue with large number of tables
|
List | pgsql-hackers |
Hi, On Thu, Dec 3, 2020 at 9:09 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/12/03 11:46, Kasahara Tatsuhito wrote: > > On Wed, Dec 2, 2020 at 7:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >> > >> On Wed, Dec 2, 2020 at 3:33 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>> > >>> > >>> > >>> On 2020/12/02 12:53, Masahiko Sawada wrote: > >>>> On Tue, Dec 1, 2020 at 5:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >>>>> > >>>>> On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 2020/12/01 16:23, Masahiko Sawada wrote: > >>>>>>> On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito > >>>>>>> <kasahara.tatsuhito@gmail.com> wrote: > >>>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On 2020/11/30 10:43, Masahiko Sawada wrote: > >>>>>>>>>> On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito > >>>>>>>>>> <kasahara.tatsuhito@gmail.com> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Hi, Thanks for you comments. > >>>>>>>>>>> > >>>>>>>>>>> On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On 2020/11/27 18:38, Kasahara Tatsuhito wrote: > >>>>>>>>>>>>> Hi, > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On 2020/11/26 10:41, Kasahara Tatsuhito wrote: > >>>>>>>>>>>>>>> On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito > >>>>>>>>>>>>>>>> <kasahara.tatsuhito@gmail.com> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito > >>>>>>>>>>>>>>>>>> <kasahara.tatsuhito@gmail.com> wrote: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito > >>>>>>>>>>>>>>>>>>> <kasahara.tatsuhito@gmail.com> wrote: > >>>>>>>>>>>>>>>>>>>>> I wonder if we could have table_recheck_autovac do two probes of the stats > >>>>>>>>>>>>>>>>>>>>> data. First probe the existing stats data, and if it shows the table to > >>>>>>>>>>>>>>>>>>>>> be already vacuumed, return immediately. If not, *then* force a stats > >>>>>>>>>>>>>>>>>>>>> re-read, and check a second time. > >>>>>>>>>>>>>>>>>>>> Does the above mean that the second and subsequent table_recheck_autovac() > >>>>>>>>>>>>>>>>>>>> will be improved to first check using the previous refreshed statistics? > >>>>>>>>>>>>>>>>>>>> I think that certainly works. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> If that's correct, I'll try to create a patch for the PoC > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> I still don't know how to reproduce Jim's troubles, but I was able to reproduce > >>>>>>>>>>>>>>>>>>> what was probably a very similar problem. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> This problem seems to be more likely to occur in cases where you have > >>>>>>>>>>>>>>>>>>> a large number of tables, > >>>>>>>>>>>>>>>>>>> i.e., a large amount of stats, and many small tables need VACUUM at > >>>>>>>>>>>>>>>>>>> the same time. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> So I followed Tom's advice and created a patch for the PoC. > >>>>>>>>>>>>>>>>>>> This patch will enable a flag in the table_recheck_autovac function to use > >>>>>>>>>>>>>>>>>>> the existing stats next time if VACUUM (or ANALYZE) has already been done > >>>>>>>>>>>>>>>>>>> by another worker on the check after the stats have been updated. > >>>>>>>>>>>>>>>>>>> If the tables continue to require VACUUM after the refresh, then a refresh > >>>>>>>>>>>>>>>>>>> will be required instead of using the existing statistics. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> I did simple test with HEAD and HEAD + this PoC patch. > >>>>>>>>>>>>>>>>>>> The tests were conducted in two cases. > >>>>>>>>>>>>>>>>>>> (I changed few configurations. see attached scripts) > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> 1. Normal VACUUM case > >>>>>>>>>>>>>>>>>>> - SET autovacuum = off > >>>>>>>>>>>>>>>>>>> - CREATE tables with 100 rows > >>>>>>>>>>>>>>>>>>> - DELETE 90 rows for each tables > >>>>>>>>>>>>>>>>>>> - SET autovacuum = on and restart PostgreSQL > >>>>>>>>>>>>>>>>>>> - Measure the time it takes for all tables to be VACUUMed > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> 2. Anti wrap round VACUUM case > >>>>>>>>>>>>>>>>>>> - CREATE brank tables > >>>>>>>>>>>>>>>>>>> - SELECT all of these tables (for generate stats) > >>>>>>>>>>>>>>>>>>> - SET autovacuum_freeze_max_age to low values and restart PostgreSQL > >>>>>>>>>>>>>>>>>>> - Consumes a lot of XIDs by using txid_curent() > >>>>>>>>>>>>>>>>>>> - Measure the time it takes for all tables to be VACUUMed > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> For each test case, the following results were obtained by changing > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. > >>>>>>>>>>>>>>>>>>> Also changing num of tables to 1000, 5000, 10000 and 20000. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable, > >>>>>>>>>>>>>>>>>>> but I think it's enough to ask for a trend. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> =========================================================================== > >>>>>>>>>>>>>>>>>>> [1.Normal VACUUM case] > >>>>>>>>>>>>>>>>>>> tables:1000 > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> tables:5000 > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> tables:10000 > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> tables:20000 > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> [2.Anti wrap round VACUUM case] > >>>>>>>>>>>>>>>>>>> tables:1000 > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> tables:5000 > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> tables:10000 > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> tables:20000 > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec > >>>>>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec > >>>>>>>>>>>>>>>>>>> =========================================================================== > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> The cases without patch, the scalability of the worker has decreased > >>>>>>>>>>>>>>>>>>> as the number of tables has increased. > >>>>>>>>>>>>>>>>>>> In fact, the more workers there are, the longer it takes to complete > >>>>>>>>>>>>>>>>>>> VACUUM to all tables. > >>>>>>>>>>>>>>>>>>> The cases with patch, it shows good scalability with respect to the > >>>>>>>>>>>>>>>>>>> number of workers. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> It seems a good performance improvement even without the patch of > >>>>>>>>>>>>>>>>>> shared memory based stats collector. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Sounds great! > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Note that perf top results showed that hash_search_with_hash_value, > >>>>>>>>>>>>>>>>>>> hash_seq_search and > >>>>>>>>>>>>>>>>>>> pgstat_read_statsfiles are dominant during VACUUM in all patterns, > >>>>>>>>>>>>>>>>>>> with or without the patch. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Therefore, there is still a need to find ways to optimize the reading > >>>>>>>>>>>>>>>>>>> of large amounts of stats. > >>>>>>>>>>>>>>>>>>> However, this patch is effective in its own right, and since there are > >>>>>>>>>>>>>>>>>>> only a few parts to modify, > >>>>>>>>>>>>>>>>>>> I think it should be able to be applied to current (preferably > >>>>>>>>>>>>>>>>>>> pre-v13) PostgreSQL. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> +1 > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> + > >>>>>>>>>>>>>>>>>> + /* We might be better to refresh stats */ > >>>>>>>>>>>>>>>>>> + use_existing_stats = false; > >>>>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>>>> + else > >>>>>>>>>>>>>>>>>> + { > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> - heap_freetuple(classTup); > >>>>>>>>>>>>>>>>>> + heap_freetuple(classTup); > >>>>>>>>>>>>>>>>>> + /* The relid has already vacuumed, so we might be better to > >>>>>>>>>>>>>>>>>> use exiting stats */ > >>>>>>>>>>>>>>>>>> + use_existing_stats = true; > >>>>>>>>>>>>>>>>>> + } > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> With that patch, the autovacuum process refreshes the stats in the > >>>>>>>>>>>>>>>>>> next check if it finds out that the table still needs to be vacuumed. > >>>>>>>>>>>>>>>>>> But I guess it's not necessarily true because the next table might be > >>>>>>>>>>>>>>>>>> vacuumed already. So I think we might want to always use the existing > >>>>>>>>>>>>>>>>>> for the first check. What do you think? > >>>>>>>>>>>>>>>>> Thanks for your comment. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> If we assume the case where some workers vacuum on large tables > >>>>>>>>>>>>>>>>> and a single worker vacuum on small tables, the processing > >>>>>>>>>>>>>>>>> performance of the single worker will be slightly lower if the > >>>>>>>>>>>>>>>>> existing statistics are checked every time. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> In fact, at first I tried to check the existing stats every time, > >>>>>>>>>>>>>>>>> but the performance was slightly worse in cases with a small number of workers. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Do you have this benchmark result? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> (Checking the existing stats is lightweight , but at high frequency, > >>>>>>>>>>>>>>>>> it affects processing performance.) > >>>>>>>>>>>>>>>>> Therefore, at after refresh statistics, determine whether autovac > >>>>>>>>>>>>>>>>> should use the existing statistics. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Yeah, since the test you used uses a lot of small tables, if there are > >>>>>>>>>>>>>>>> a few workers, checking the existing stats is unlikely to return true > >>>>>>>>>>>>>>>> (no need to vacuum). So the cost of existing stats check ends up being > >>>>>>>>>>>>>>>> overhead. Not sure how slow always checking the existing stats was, > >>>>>>>>>>>>>>>> but given that the shared memory based stats collector patch could > >>>>>>>>>>>>>>>> improve the performance of refreshing stats, it might be better not to > >>>>>>>>>>>>>>>> check the existing stats frequently like the patch does. Anyway, I > >>>>>>>>>>>>>>>> think it’s better to evaluate the performance improvement with other > >>>>>>>>>>>>>>>> cases too. > >>>>>>>>>>>>>>> Yeah, I would like to see how much the performance changes in other cases. > >>>>>>>>>>>>>>> In addition, if the shared-based-stats patch is applied, we won't need to reload > >>>>>>>>>>>>>>> a huge stats file, so we will just have to check the stats on > >>>>>>>>>>>>>>> shared-mem every time. > >>>>>>>>>>>>>>> Perhaps the logic of table_recheck_autovac could be simpler. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> BTW, I found some typos in comments, so attache a fixed version. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> The patch adds some duplicated codes into table_recheck_autovac(). > >>>>>>>>>>>>>> It's better to make the common function performing them and make > >>>>>>>>>>>>>> table_recheck_autovac() call that common function, to simplify the code. > >>>>>>>>>>>>> Thanks for your comment. > >>>>>>>>>>>>> Hmm.. I've cut out the duplicate part. > >>>>>>>>>>>>> Attach the patch. > >>>>>>>>>>>>> Could you confirm that it fits your expecting? > >>>>>>>>>>>> > >>>>>>>>>>>> Yes, thanks for updataing the patch! Here are another review comments. > >>>>>>>>>>>> > >>>>>>>>>>>> + shared = pgstat_fetch_stat_dbentry(InvalidOid); > >>>>>>>>>>>> + dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId); > >>>>>>>>>>>> > >>>>>>>>>>>> When using the existing stats, ISTM that these are not necessary and > >>>>>>>>>>>> we can reuse "shared" and "dbentry" obtained before. Right? > >>>>>>>>>>> Yeah, but unless autovac_refresh_stats() is called, these functions > >>>>>>>>>>> read the information from the > >>>>>>>>>>> local hash table without re-read the stats file, so the process is very light. > >>>>>>>>>>> Therefore, I think, it is better to keep the current logic to keep the > >>>>>>>>>>> code simple. > >>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> + /* We might be better to refresh stats */ > >>>>>>>>>>>> + use_existing_stats = false; > >>>>>>>>>>>> > >>>>>>>>>>>> I think that we should add more comments about why it's better to > >>>>>>>>>>>> refresh the stats in this case. > >>>>>>>>>>>> > >>>>>>>>>>>> + /* The relid has already vacuumed, so we might be better to use existing stats */ > >>>>>>>>>>>> + use_existing_stats = true; > >>>>>>>>>>>> > >>>>>>>>>>>> I think that we should add more comments about why it's better to > >>>>>>>>>>>> reuse the stats in this case. > >>>>>>>>>>> I added comments. > >>>>>>>>>>> > >>>>>>>>>>> Attache the patch. > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Thank you for updating the patch. Here are some small comments on the > >>>>>>>>>> latest (v4) patch. > >>>>>>>>>> > >>>>>>>>>> + * So if the last time we checked a table that was already vacuumed after > >>>>>>>>>> + * refres stats, check the current statistics before refreshing it. > >>>>>>>>>> + */ > >>>>>>>>>> > >>>>>>>>>> s/refres/refresh/ > >>>>>>>> Thanks! fixed. > >>>>>>>> Attached the patch. > >>>>>>>> > >>>>>>>>>> > >>>>>>>>>> ----- > >>>>>>>>>> +/* Counter to determine if statistics should be refreshed */ > >>>>>>>>>> +static bool use_existing_stats = false; > >>>>>>>>>> + > >>>>>>>>>> > >>>>>>>>>> I think 'use_existing_stats' can be declared within table_recheck_autovac(). > >>>>>>>>>> > >>>>>>>>>> ----- > >>>>>>>>>> While testing the performance, I realized that the statistics are > >>>>>>>>>> reset every time vacuumed one table, leading to re-reading the stats > >>>>>>>>>> file even if 'use_existing_stats' is true. Please refer that vacuum() > >>>>>>>>>> eventually calls AtEOXact_PgStat() which calls to > >>>>>>>>>> pgstat_clear_snapshot(). > >>>>>>>>> > >>>>>>>>> Good catch! > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> I believe that's why the performance of the > >>>>>>>>>> method of always checking the existing stats wasn’t good as expected. > >>>>>>>>>> So if we save the statistics somewhere and use it for rechecking, the > >>>>>>>>>> results of the performance benchmark will differ between these two > >>>>>>>>>> methods. > >>>>>>>> Thanks for you checks. > >>>>>>>> But, if a worker did vacuum(), that means this worker had determined > >>>>>>>> need vacuum in the > >>>>>>>> table_recheck_autovac(). So, use_existing_stats set to false, and next > >>>>>>>> time, refresh stats. > >>>>>>>> Therefore I think the current patch is fine, as we want to avoid > >>>>>>>> unnecessary refreshing of > >>>>>>>> statistics before the actual vacuum(), right? > >>>>>>> > >>>>>>> Yes, you're right. > >>>>>>> > >>>>>>> When I benchmarked the performance of the method of always checking > >>>>>>> existing stats I edited your patch so that it sets use_existing_stats > >>>>>>> = true even if the second check is false (i.g., vacuum is needed). > >>>>>>> And the result I got was worse than expected especially in the case of > >>>>>>> a few autovacuum workers. But it doesn't evaluate the performance of > >>>>>>> that method rightly as the stats snapshot is cleared every time > >>>>>>> vacuum. Given you had similar results, I guess you used a similar way > >>>>>>> when evaluating it, is it right? If so, it’s better to fix this issue > >>>>>>> and see how the performance benchmark results will differ. > >>>>>>> > >>>>>>> For example, the results of the test case with 10000 tables and 1 > >>>>>>> autovacuum worker I reported before was: > >>>>>>> > >>>>>>> 10000 tables: > >>>>>>> autovac_workers 1 : 158s,157s, 290s > >>>>>>> > >>>>>>> But after fixing that issue in the third method (always checking the > >>>>>>> existing stats), the results are: > >>>>>> > >>>>>> Could you tell me how you fixed that issue? You copied the stats to > >>>>>> somewhere as you suggested or skipped pgstat_clear_snapshot() as > >>>>>> I suggested? > >>>>> > >>>>> I used the way you suggested in this quick test; skipped > >>>>> pgstat_clear_snapshot(). > >>>>> > >>>>>> > >>>>>> Kasahara-san seems not to like the latter idea because it might > >>>>>> cause bad side effect. So we should use the former idea? > >>>>> > >>>>> Not sure. I'm also concerned about the side effect but I've not checked yet. > >>>>> > >>>>> Since probably there is no big difference between the two ways in > >>>>> terms of performance I'm going to see how the performance benchmark > >>>>> result will change first. > >>>> > >>>> I've tested performance improvement again. From the left the execution > >>>> time of the current HEAD, Kasahara-san's patch, the method of always > >>>> checking the existing stats (using approach suggested by Fujii-san), > >>>> in seconds. > >>>> > >>>> 1000 tables: > >>>> autovac_workers 1 : 13s, 13s, 13s > >>>> autovac_workers 2 : 6s, 4s, 4s > >>>> autovac_workers 3 : 3s, 4s, 3s > >>>> autovac_workers 5 : 3s, 3s, 2s > >>>> autovac_workers 10: 2s, 3s, 2s > >>>> > >>>> 5000 tables: > >>>> autovac_workers 1 : 71s, 71s, 72s > >>>> autovac_workers 2 : 37s, 32s, 32s > >>>> autovac_workers 3 : 29s, 26s, 26s > >>>> autovac_workers 5 : 20s, 19s, 18s > >>>> autovac_workers 10: 13s, 8s, 8s > >>>> > >>>> 10000 tables: > >>>> autovac_workers 1 : 158s,157s, 159s > >>>> autovac_workers 2 : 80s, 53s, 78s > >>>> autovac_workers 3 : 75s, 67s, 67s > >>>> autovac_workers 5 : 61s, 42s, 42s > >>>> autovac_workers 10: 69s, 26s, 25s > >>>> > >>>> 20000 tables: > >>>> autovac_workers 1 : 379s, 380s, 389s > >>>> autovac_workers 2 : 236s, 232s, 233s > >>>> autovac_workers 3 : 222s, 181s, 182s > >>>> autovac_workers 5 : 212s, 132s, 139s > >>>> autovac_workers 10: 317s, 91s, 89s > >>>> > >>>> I don't see a big difference between Kasahara-san's patch and the > >>>> method of always checking the existing stats. > >>> > >>> Thanks for doing the benchmark! > >>> > >>> This benchmark result makes me think that we don't need to tweak > >>> AtEOXact_PgStat() and can use Kasahara-san approach. > >>> That's good news :) > >> > >> Yeah, given that all autovaucum workers have the list of tables to > >> vacuum in the same order in most cases, the assumption in > >> Kasahara-san’s patch that if a worker needs to vacuum a table it’s > >> unlikely that it will be able to skip the next table using the current > >> snapshot of stats makes sense to me. > >> > >> One small comment on v6 patch: > >> > >> + /* When we decide to do vacuum or analyze, the existing stats cannot > >> + * be reused in the next cycle because it's cleared at the end of vacuum > >> + * or analyze (by AtEOXact_PgStat()). > >> + */ > >> + use_existing_stats = false; > >> > >> I think the comment should start on the second line (i.g., \n is > >> needed after /*). > > Oops, thanks. > > Fixed. > > Thanks for updating the patch! > > I applied the following cosmetic changes to the patch. > Attached is the updated version of the patch. > Coud you review this version? Thanks for tweaking the patch. > - Ran pgindent to fix some warnings that "git diff --check" > reported on the patch. > - Made the order of arguments consistent between > recheck_relation_needs_vacanalyze and relation_needs_vacanalyze. > - Renamed the variable use_existing_stats to reuse_stats for simplicity. > - Added more comments. I think it's no problem. The patch passed makecheck, and I benchmarked "Anti wrap round VACUUM case" (only 20000 tables) just in case. From the left the execution time of the current HEAD, v8 patch. tables 20000: autovac workers 1: 319sec, 315sec autovac workers 2: 301sec, 190sec autovac workers 3: 270sec, 133sec autovac workers 5: 211sec, 86sec autovac workers 10: 376sec, 68sec It's as expected. > Barring any objection, I'm thinking to commit this version. +1 Best regards, > Regards, > > -- > Fujii Masao > Advanced Computing Technology Center > Research and Development Headquarters > NTT DATA CORPORATION -- Tatsuhito Kasahara kasahara.tatsuhito _at_ gmail.com
pgsql-hackers by date: