Re: autovac issue with large number of tables - Mailing list pgsql-hackers
From | Kasahara Tatsuhito |
---|---|
Subject | Re: autovac issue with large number of tables |
Date | |
Msg-id | CAP0=ZVLf4btOHr_LEaga8WqVYvb9WsbSyF98Z7vL2wNOE2dpDg@mail.gmail.com Whole thread Raw |
In response to | Re: autovac issue with large number of tables (Fujii Masao <masao.fujii@oss.nttdata.com>) |
List | pgsql-hackers |
On Wed, Dec 9, 2020 at 12:01 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/12/04 12:21, Kasahara Tatsuhito wrote: > > Hi, > > > > On Thu, Dec 3, 2020 at 9:09 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >> > >> > >> > >> On 2020/12/03 11:46, Kasahara Tatsuhito wrote: > >>> On Wed, Dec 2, 2020 at 7:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >>>> > >>>> On Wed, Dec 2, 2020 at 3:33 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>>>> > >>>>> > >>>>> > >>>>> On 2020/12/02 12:53, Masahiko Sawada wrote: > >>>>>> On Tue, Dec 1, 2020 at 5:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >>>>>>> > >>>>>>> On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On 2020/12/01 16:23, Masahiko Sawada wrote: > >>>>>>>>> On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito > >>>>>>>>> <kasahara.tatsuhito@gmail.com> wrote: > >>>>>>>>>> > >>>>>>>>>> Hi, > >>>>>>>>>> > >>>>>>>>>> On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On 2020/11/30 10:43, Masahiko Sawada wrote: > >>>>>>>>>>>> On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito > >>>>>>>>>>>> <kasahara.tatsuhito@gmail.com> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Hi, Thanks for you comments. > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On 2020/11/27 18:38, Kasahara Tatsuhito wrote: > >>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On 2020/11/26 10:41, Kasahara Tatsuhito wrote: > >>>>>>>>>>>>>>>>> On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito > >>>>>>>>>>>>>>>>>> <kasahara.tatsuhito@gmail.com> wrote: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito > >>>>>>>>>>>>>>>>>>>> <kasahara.tatsuhito@gmail.com> wrote: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito > >>>>>>>>>>>>>>>>>>>>> <kasahara.tatsuhito@gmail.com> wrote: > >>>>>>>>>>>>>>>>>>>>>>> I wonder if we could have table_recheck_autovac do two probes of the stats > >>>>>>>>>>>>>>>>>>>>>>> data. First probe the existing stats data, and if it shows the table to > >>>>>>>>>>>>>>>>>>>>>>> be already vacuumed, return immediately. If not, *then* force a stats > >>>>>>>>>>>>>>>>>>>>>>> re-read, and check a second time. > >>>>>>>>>>>>>>>>>>>>>> Does the above mean that the second and subsequent table_recheck_autovac() > >>>>>>>>>>>>>>>>>>>>>> will be improved to first check using the previous refreshed statistics? > >>>>>>>>>>>>>>>>>>>>>> I think that certainly works. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> If that's correct, I'll try to create a patch for the PoC > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> I still don't know how to reproduce Jim's troubles, but I was able to reproduce > >>>>>>>>>>>>>>>>>>>>> what was probably a very similar problem. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> This problem seems to be more likely to occur in cases where you have > >>>>>>>>>>>>>>>>>>>>> a large number of tables, > >>>>>>>>>>>>>>>>>>>>> i.e., a large amount of stats, and many small tables need VACUUM at > >>>>>>>>>>>>>>>>>>>>> the same time. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> So I followed Tom's advice and created a patch for the PoC. > >>>>>>>>>>>>>>>>>>>>> This patch will enable a flag in the table_recheck_autovac function to use > >>>>>>>>>>>>>>>>>>>>> the existing stats next time if VACUUM (or ANALYZE) has already been done > >>>>>>>>>>>>>>>>>>>>> by another worker on the check after the stats have been updated. > >>>>>>>>>>>>>>>>>>>>> If the tables continue to require VACUUM after the refresh, then a refresh > >>>>>>>>>>>>>>>>>>>>> will be required instead of using the existing statistics. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> I did simple test with HEAD and HEAD + this PoC patch. > >>>>>>>>>>>>>>>>>>>>> The tests were conducted in two cases. > >>>>>>>>>>>>>>>>>>>>> (I changed few configurations. see attached scripts) > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> 1. Normal VACUUM case > >>>>>>>>>>>>>>>>>>>>> - SET autovacuum = off > >>>>>>>>>>>>>>>>>>>>> - CREATE tables with 100 rows > >>>>>>>>>>>>>>>>>>>>> - DELETE 90 rows for each tables > >>>>>>>>>>>>>>>>>>>>> - SET autovacuum = on and restart PostgreSQL > >>>>>>>>>>>>>>>>>>>>> - Measure the time it takes for all tables to be VACUUMed > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> 2. Anti wrap round VACUUM case > >>>>>>>>>>>>>>>>>>>>> - CREATE brank tables > >>>>>>>>>>>>>>>>>>>>> - SELECT all of these tables (for generate stats) > >>>>>>>>>>>>>>>>>>>>> - SET autovacuum_freeze_max_age to low values and restart PostgreSQL > >>>>>>>>>>>>>>>>>>>>> - Consumes a lot of XIDs by using txid_curent() > >>>>>>>>>>>>>>>>>>>>> - Measure the time it takes for all tables to be VACUUMed > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> For each test case, the following results were obtained by changing > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. > >>>>>>>>>>>>>>>>>>>>> Also changing num of tables to 1000, 5000, 10000 and 20000. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable, > >>>>>>>>>>>>>>>>>>>>> but I think it's enough to ask for a trend. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> =========================================================================== > >>>>>>>>>>>>>>>>>>>>> [1.Normal VACUUM case] > >>>>>>>>>>>>>>>>>>>>> tables:1000 > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> tables:5000 > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> tables:10000 > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> tables:20000 > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> [2.Anti wrap round VACUUM case] > >>>>>>>>>>>>>>>>>>>>> tables:1000 > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> tables:5000 > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> tables:10000 > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> tables:20000 > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec > >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec > >>>>>>>>>>>>>>>>>>>>> =========================================================================== > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> The cases without patch, the scalability of the worker has decreased > >>>>>>>>>>>>>>>>>>>>> as the number of tables has increased. > >>>>>>>>>>>>>>>>>>>>> In fact, the more workers there are, the longer it takes to complete > >>>>>>>>>>>>>>>>>>>>> VACUUM to all tables. > >>>>>>>>>>>>>>>>>>>>> The cases with patch, it shows good scalability with respect to the > >>>>>>>>>>>>>>>>>>>>> number of workers. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> It seems a good performance improvement even without the patch of > >>>>>>>>>>>>>>>>>>>> shared memory based stats collector. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Sounds great! > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Note that perf top results showed that hash_search_with_hash_value, > >>>>>>>>>>>>>>>>>>>>> hash_seq_search and > >>>>>>>>>>>>>>>>>>>>> pgstat_read_statsfiles are dominant during VACUUM in all patterns, > >>>>>>>>>>>>>>>>>>>>> with or without the patch. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Therefore, there is still a need to find ways to optimize the reading > >>>>>>>>>>>>>>>>>>>>> of large amounts of stats. > >>>>>>>>>>>>>>>>>>>>> However, this patch is effective in its own right, and since there are > >>>>>>>>>>>>>>>>>>>>> only a few parts to modify, > >>>>>>>>>>>>>>>>>>>>> I think it should be able to be applied to current (preferably > >>>>>>>>>>>>>>>>>>>>> pre-v13) PostgreSQL. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> +1 > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> + > >>>>>>>>>>>>>>>>>>>> + /* We might be better to refresh stats */ > >>>>>>>>>>>>>>>>>>>> + use_existing_stats = false; > >>>>>>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>>>>>> + else > >>>>>>>>>>>>>>>>>>>> + { > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> - heap_freetuple(classTup); > >>>>>>>>>>>>>>>>>>>> + heap_freetuple(classTup); > >>>>>>>>>>>>>>>>>>>> + /* The relid has already vacuumed, so we might be better to > >>>>>>>>>>>>>>>>>>>> use exiting stats */ > >>>>>>>>>>>>>>>>>>>> + use_existing_stats = true; > >>>>>>>>>>>>>>>>>>>> + } > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> With that patch, the autovacuum process refreshes the stats in the > >>>>>>>>>>>>>>>>>>>> next check if it finds out that the table still needs to be vacuumed. > >>>>>>>>>>>>>>>>>>>> But I guess it's not necessarily true because the next table might be > >>>>>>>>>>>>>>>>>>>> vacuumed already. So I think we might want to always use the existing > >>>>>>>>>>>>>>>>>>>> for the first check. What do you think? > >>>>>>>>>>>>>>>>>>> Thanks for your comment. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> If we assume the case where some workers vacuum on large tables > >>>>>>>>>>>>>>>>>>> and a single worker vacuum on small tables, the processing > >>>>>>>>>>>>>>>>>>> performance of the single worker will be slightly lower if the > >>>>>>>>>>>>>>>>>>> existing statistics are checked every time. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> In fact, at first I tried to check the existing stats every time, > >>>>>>>>>>>>>>>>>>> but the performance was slightly worse in cases with a small number of workers. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Do you have this benchmark result? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> (Checking the existing stats is lightweight , but at high frequency, > >>>>>>>>>>>>>>>>>>> it affects processing performance.) > >>>>>>>>>>>>>>>>>>> Therefore, at after refresh statistics, determine whether autovac > >>>>>>>>>>>>>>>>>>> should use the existing statistics. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Yeah, since the test you used uses a lot of small tables, if there are > >>>>>>>>>>>>>>>>>> a few workers, checking the existing stats is unlikely to return true > >>>>>>>>>>>>>>>>>> (no need to vacuum). So the cost of existing stats check ends up being > >>>>>>>>>>>>>>>>>> overhead. Not sure how slow always checking the existing stats was, > >>>>>>>>>>>>>>>>>> but given that the shared memory based stats collector patch could > >>>>>>>>>>>>>>>>>> improve the performance of refreshing stats, it might be better not to > >>>>>>>>>>>>>>>>>> check the existing stats frequently like the patch does. Anyway, I > >>>>>>>>>>>>>>>>>> think it’s better to evaluate the performance improvement with other > >>>>>>>>>>>>>>>>>> cases too. > >>>>>>>>>>>>>>>>> Yeah, I would like to see how much the performance changes in other cases. > >>>>>>>>>>>>>>>>> In addition, if the shared-based-stats patch is applied, we won't need to reload > >>>>>>>>>>>>>>>>> a huge stats file, so we will just have to check the stats on > >>>>>>>>>>>>>>>>> shared-mem every time. > >>>>>>>>>>>>>>>>> Perhaps the logic of table_recheck_autovac could be simpler. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> BTW, I found some typos in comments, so attache a fixed version. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> The patch adds some duplicated codes into table_recheck_autovac(). > >>>>>>>>>>>>>>>> It's better to make the common function performing them and make > >>>>>>>>>>>>>>>> table_recheck_autovac() call that common function, to simplify the code. > >>>>>>>>>>>>>>> Thanks for your comment. > >>>>>>>>>>>>>>> Hmm.. I've cut out the duplicate part. > >>>>>>>>>>>>>>> Attach the patch. > >>>>>>>>>>>>>>> Could you confirm that it fits your expecting? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Yes, thanks for updataing the patch! Here are another review comments. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> + shared = pgstat_fetch_stat_dbentry(InvalidOid); > >>>>>>>>>>>>>> + dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId); > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> When using the existing stats, ISTM that these are not necessary and > >>>>>>>>>>>>>> we can reuse "shared" and "dbentry" obtained before. Right? > >>>>>>>>>>>>> Yeah, but unless autovac_refresh_stats() is called, these functions > >>>>>>>>>>>>> read the information from the > >>>>>>>>>>>>> local hash table without re-read the stats file, so the process is very light. > >>>>>>>>>>>>> Therefore, I think, it is better to keep the current logic to keep the > >>>>>>>>>>>>> code simple. > >>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> + /* We might be better to refresh stats */ > >>>>>>>>>>>>>> + use_existing_stats = false; > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I think that we should add more comments about why it's better to > >>>>>>>>>>>>>> refresh the stats in this case. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> + /* The relid has already vacuumed, so we might be better to use existing stats */ > >>>>>>>>>>>>>> + use_existing_stats = true; > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I think that we should add more comments about why it's better to > >>>>>>>>>>>>>> reuse the stats in this case. > >>>>>>>>>>>>> I added comments. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Attache the patch. > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Thank you for updating the patch. Here are some small comments on the > >>>>>>>>>>>> latest (v4) patch. > >>>>>>>>>>>> > >>>>>>>>>>>> + * So if the last time we checked a table that was already vacuumed after > >>>>>>>>>>>> + * refres stats, check the current statistics before refreshing it. > >>>>>>>>>>>> + */ > >>>>>>>>>>>> > >>>>>>>>>>>> s/refres/refresh/ > >>>>>>>>>> Thanks! fixed. > >>>>>>>>>> Attached the patch. > >>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> ----- > >>>>>>>>>>>> +/* Counter to determine if statistics should be refreshed */ > >>>>>>>>>>>> +static bool use_existing_stats = false; > >>>>>>>>>>>> + > >>>>>>>>>>>> > >>>>>>>>>>>> I think 'use_existing_stats' can be declared within table_recheck_autovac(). > >>>>>>>>>>>> > >>>>>>>>>>>> ----- > >>>>>>>>>>>> While testing the performance, I realized that the statistics are > >>>>>>>>>>>> reset every time vacuumed one table, leading to re-reading the stats > >>>>>>>>>>>> file even if 'use_existing_stats' is true. Please refer that vacuum() > >>>>>>>>>>>> eventually calls AtEOXact_PgStat() which calls to > >>>>>>>>>>>> pgstat_clear_snapshot(). > >>>>>>>>>>> > >>>>>>>>>>> Good catch! > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> I believe that's why the performance of the > >>>>>>>>>>>> method of always checking the existing stats wasn’t good as expected. > >>>>>>>>>>>> So if we save the statistics somewhere and use it for rechecking, the > >>>>>>>>>>>> results of the performance benchmark will differ between these two > >>>>>>>>>>>> methods. > >>>>>>>>>> Thanks for you checks. > >>>>>>>>>> But, if a worker did vacuum(), that means this worker had determined > >>>>>>>>>> need vacuum in the > >>>>>>>>>> table_recheck_autovac(). So, use_existing_stats set to false, and next > >>>>>>>>>> time, refresh stats. > >>>>>>>>>> Therefore I think the current patch is fine, as we want to avoid > >>>>>>>>>> unnecessary refreshing of > >>>>>>>>>> statistics before the actual vacuum(), right? > >>>>>>>>> > >>>>>>>>> Yes, you're right. > >>>>>>>>> > >>>>>>>>> When I benchmarked the performance of the method of always checking > >>>>>>>>> existing stats I edited your patch so that it sets use_existing_stats > >>>>>>>>> = true even if the second check is false (i.g., vacuum is needed). > >>>>>>>>> And the result I got was worse than expected especially in the case of > >>>>>>>>> a few autovacuum workers. But it doesn't evaluate the performance of > >>>>>>>>> that method rightly as the stats snapshot is cleared every time > >>>>>>>>> vacuum. Given you had similar results, I guess you used a similar way > >>>>>>>>> when evaluating it, is it right? If so, it’s better to fix this issue > >>>>>>>>> and see how the performance benchmark results will differ. > >>>>>>>>> > >>>>>>>>> For example, the results of the test case with 10000 tables and 1 > >>>>>>>>> autovacuum worker I reported before was: > >>>>>>>>> > >>>>>>>>> 10000 tables: > >>>>>>>>> autovac_workers 1 : 158s,157s, 290s > >>>>>>>>> > >>>>>>>>> But after fixing that issue in the third method (always checking the > >>>>>>>>> existing stats), the results are: > >>>>>>>> > >>>>>>>> Could you tell me how you fixed that issue? You copied the stats to > >>>>>>>> somewhere as you suggested or skipped pgstat_clear_snapshot() as > >>>>>>>> I suggested? > >>>>>>> > >>>>>>> I used the way you suggested in this quick test; skipped > >>>>>>> pgstat_clear_snapshot(). > >>>>>>> > >>>>>>>> > >>>>>>>> Kasahara-san seems not to like the latter idea because it might > >>>>>>>> cause bad side effect. So we should use the former idea? > >>>>>>> > >>>>>>> Not sure. I'm also concerned about the side effect but I've not checked yet. > >>>>>>> > >>>>>>> Since probably there is no big difference between the two ways in > >>>>>>> terms of performance I'm going to see how the performance benchmark > >>>>>>> result will change first. > >>>>>> > >>>>>> I've tested performance improvement again. From the left the execution > >>>>>> time of the current HEAD, Kasahara-san's patch, the method of always > >>>>>> checking the existing stats (using approach suggested by Fujii-san), > >>>>>> in seconds. > >>>>>> > >>>>>> 1000 tables: > >>>>>> autovac_workers 1 : 13s, 13s, 13s > >>>>>> autovac_workers 2 : 6s, 4s, 4s > >>>>>> autovac_workers 3 : 3s, 4s, 3s > >>>>>> autovac_workers 5 : 3s, 3s, 2s > >>>>>> autovac_workers 10: 2s, 3s, 2s > >>>>>> > >>>>>> 5000 tables: > >>>>>> autovac_workers 1 : 71s, 71s, 72s > >>>>>> autovac_workers 2 : 37s, 32s, 32s > >>>>>> autovac_workers 3 : 29s, 26s, 26s > >>>>>> autovac_workers 5 : 20s, 19s, 18s > >>>>>> autovac_workers 10: 13s, 8s, 8s > >>>>>> > >>>>>> 10000 tables: > >>>>>> autovac_workers 1 : 158s,157s, 159s > >>>>>> autovac_workers 2 : 80s, 53s, 78s > >>>>>> autovac_workers 3 : 75s, 67s, 67s > >>>>>> autovac_workers 5 : 61s, 42s, 42s > >>>>>> autovac_workers 10: 69s, 26s, 25s > >>>>>> > >>>>>> 20000 tables: > >>>>>> autovac_workers 1 : 379s, 380s, 389s > >>>>>> autovac_workers 2 : 236s, 232s, 233s > >>>>>> autovac_workers 3 : 222s, 181s, 182s > >>>>>> autovac_workers 5 : 212s, 132s, 139s > >>>>>> autovac_workers 10: 317s, 91s, 89s > >>>>>> > >>>>>> I don't see a big difference between Kasahara-san's patch and the > >>>>>> method of always checking the existing stats. > >>>>> > >>>>> Thanks for doing the benchmark! > >>>>> > >>>>> This benchmark result makes me think that we don't need to tweak > >>>>> AtEOXact_PgStat() and can use Kasahara-san approach. > >>>>> That's good news :) > >>>> > >>>> Yeah, given that all autovaucum workers have the list of tables to > >>>> vacuum in the same order in most cases, the assumption in > >>>> Kasahara-san’s patch that if a worker needs to vacuum a table it’s > >>>> unlikely that it will be able to skip the next table using the current > >>>> snapshot of stats makes sense to me. > >>>> > >>>> One small comment on v6 patch: > >>>> > >>>> + /* When we decide to do vacuum or analyze, the existing stats cannot > >>>> + * be reused in the next cycle because it's cleared at the end of vacuum > >>>> + * or analyze (by AtEOXact_PgStat()). > >>>> + */ > >>>> + use_existing_stats = false; > >>>> > >>>> I think the comment should start on the second line (i.g., \n is > >>>> needed after /*). > >>> Oops, thanks. > >>> Fixed. > >> > >> Thanks for updating the patch! > >> > >> I applied the following cosmetic changes to the patch. > >> Attached is the updated version of the patch. > >> Coud you review this version? > > Thanks for tweaking the patch. > > > >> - Ran pgindent to fix some warnings that "git diff --check" > >> reported on the patch. > >> - Made the order of arguments consistent between > >> recheck_relation_needs_vacanalyze and relation_needs_vacanalyze. > >> - Renamed the variable use_existing_stats to reuse_stats for simplicity. > >> - Added more comments. > > I think it's no problem. > > The patch passed makecheck, and I benchmarked "Anti wrap round VACUUM > > case" (only 20000 tables) just in case. > > > > From the left the execution time of the current HEAD, v8 patch. > > tables 20000: > > autovac workers 1: 319sec, 315sec > > autovac workers 2: 301sec, 190sec > > autovac workers 3: 270sec, 133sec > > autovac workers 5: 211sec, 86sec > > autovac workers 10: 376sec, 68sec > > > > It's as expected. > > Thanks! > > > >> Barring any objection, I'm thinking to commit this version. > > +1 > > Pushed. Thanks ! > > Regards, > > -- > Fujii Masao > Advanced Computing Technology Center > Research and Development Headquarters > NTT DATA CORPORATION -- Tatsuhito Kasahara kasahara.tatsuhito _at_ gmail.com
pgsql-hackers by date: