Re: autovac issue with large number of tables - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: autovac issue with large number of tables |
Date | |
Msg-id | 7c71a618-2d79-22a0-a40c-f2cd23c30f86@oss.nttdata.com Whole thread Raw |
In response to | Re: autovac issue with large number of tables (Kasahara Tatsuhito <kasahara.tatsuhito@gmail.com>) |
Responses |
Re: autovac issue with large number of tables
|
List | pgsql-hackers |
On 2020/12/04 12:21, Kasahara Tatsuhito wrote: > Hi, > > On Thu, Dec 3, 2020 at 9:09 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> >> >> On 2020/12/03 11:46, Kasahara Tatsuhito wrote: >>> On Wed, Dec 2, 2020 at 7:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>>> >>>> On Wed, Dec 2, 2020 at 3:33 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>>> >>>>> >>>>> >>>>> On 2020/12/02 12:53, Masahiko Sawada wrote: >>>>>> On Tue, Dec 1, 2020 at 5:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>>>>>> >>>>>>> On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 2020/12/01 16:23, Masahiko Sawada wrote: >>>>>>>>> On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito >>>>>>>>> <kasahara.tatsuhito@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 2020/11/30 10:43, Masahiko Sawada wrote: >>>>>>>>>>>> On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito >>>>>>>>>>>> <kasahara.tatsuhito@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi, Thanks for you comments. >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 2020/11/27 18:38, Kasahara Tatsuhito wrote: >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 2020/11/26 10:41, Kasahara Tatsuhito wrote: >>>>>>>>>>>>>>>>> On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito >>>>>>>>>>>>>>>>>> <kasahara.tatsuhito@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito >>>>>>>>>>>>>>>>>>>> <kasahara.tatsuhito@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito >>>>>>>>>>>>>>>>>>>>> <kasahara.tatsuhito@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>> I wonder if we could have table_recheck_autovac do two probes of the stats >>>>>>>>>>>>>>>>>>>>>>> data. First probe the existing stats data, and if it shows the table to >>>>>>>>>>>>>>>>>>>>>>> be already vacuumed, return immediately. If not, *then* force a stats >>>>>>>>>>>>>>>>>>>>>>> re-read, and check a second time. >>>>>>>>>>>>>>>>>>>>>> Does the above mean that the second and subsequent table_recheck_autovac() >>>>>>>>>>>>>>>>>>>>>> will be improved to first check using the previous refreshed statistics? >>>>>>>>>>>>>>>>>>>>>> I think that certainly works. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> If that's correct, I'll try to create a patch for the PoC >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I still don't know how to reproduce Jim's troubles, but I was able to reproduce >>>>>>>>>>>>>>>>>>>>> what was probably a very similar problem. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> This problem seems to be more likely to occur in cases where you have >>>>>>>>>>>>>>>>>>>>> a large number of tables, >>>>>>>>>>>>>>>>>>>>> i.e., a large amount of stats, and many small tables need VACUUM at >>>>>>>>>>>>>>>>>>>>> the same time. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> So I followed Tom's advice and created a patch for the PoC. >>>>>>>>>>>>>>>>>>>>> This patch will enable a flag in the table_recheck_autovac function to use >>>>>>>>>>>>>>>>>>>>> the existing stats next time if VACUUM (or ANALYZE) has already been done >>>>>>>>>>>>>>>>>>>>> by another worker on the check after the stats have been updated. >>>>>>>>>>>>>>>>>>>>> If the tables continue to require VACUUM after the refresh, then a refresh >>>>>>>>>>>>>>>>>>>>> will be required instead of using the existing statistics. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I did simple test with HEAD and HEAD + this PoC patch. >>>>>>>>>>>>>>>>>>>>> The tests were conducted in two cases. >>>>>>>>>>>>>>>>>>>>> (I changed few configurations. see attached scripts) >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 1. Normal VACUUM case >>>>>>>>>>>>>>>>>>>>> - SET autovacuum = off >>>>>>>>>>>>>>>>>>>>> - CREATE tables with 100 rows >>>>>>>>>>>>>>>>>>>>> - DELETE 90 rows for each tables >>>>>>>>>>>>>>>>>>>>> - SET autovacuum = on and restart PostgreSQL >>>>>>>>>>>>>>>>>>>>> - Measure the time it takes for all tables to be VACUUMed >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 2. Anti wrap round VACUUM case >>>>>>>>>>>>>>>>>>>>> - CREATE brank tables >>>>>>>>>>>>>>>>>>>>> - SELECT all of these tables (for generate stats) >>>>>>>>>>>>>>>>>>>>> - SET autovacuum_freeze_max_age to low values and restart PostgreSQL >>>>>>>>>>>>>>>>>>>>> - Consumes a lot of XIDs by using txid_curent() >>>>>>>>>>>>>>>>>>>>> - Measure the time it takes for all tables to be VACUUMed >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> For each test case, the following results were obtained by changing >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. >>>>>>>>>>>>>>>>>>>>> Also changing num of tables to 1000, 5000, 10000 and 20000. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable, >>>>>>>>>>>>>>>>>>>>> but I think it's enough to ask for a trend. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> =========================================================================== >>>>>>>>>>>>>>>>>>>>> [1.Normal VACUUM case] >>>>>>>>>>>>>>>>>>>>> tables:1000 >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> tables:5000 >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> tables:10000 >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> tables:20000 >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> [2.Anti wrap round VACUUM case] >>>>>>>>>>>>>>>>>>>>> tables:1000 >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> tables:5000 >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> tables:10000 >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> tables:20000 >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec >>>>>>>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec >>>>>>>>>>>>>>>>>>>>> =========================================================================== >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> The cases without patch, the scalability of the worker has decreased >>>>>>>>>>>>>>>>>>>>> as the number of tables has increased. >>>>>>>>>>>>>>>>>>>>> In fact, the more workers there are, the longer it takes to complete >>>>>>>>>>>>>>>>>>>>> VACUUM to all tables. >>>>>>>>>>>>>>>>>>>>> The cases with patch, it shows good scalability with respect to the >>>>>>>>>>>>>>>>>>>>> number of workers. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> It seems a good performance improvement even without the patch of >>>>>>>>>>>>>>>>>>>> shared memory based stats collector. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Sounds great! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Note that perf top results showed that hash_search_with_hash_value, >>>>>>>>>>>>>>>>>>>>> hash_seq_search and >>>>>>>>>>>>>>>>>>>>> pgstat_read_statsfiles are dominant during VACUUM in all patterns, >>>>>>>>>>>>>>>>>>>>> with or without the patch. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Therefore, there is still a need to find ways to optimize the reading >>>>>>>>>>>>>>>>>>>>> of large amounts of stats. >>>>>>>>>>>>>>>>>>>>> However, this patch is effective in its own right, and since there are >>>>>>>>>>>>>>>>>>>>> only a few parts to modify, >>>>>>>>>>>>>>>>>>>>> I think it should be able to be applied to current (preferably >>>>>>>>>>>>>>>>>>>>> pre-v13) PostgreSQL. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> +1 >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>>>>> + /* We might be better to refresh stats */ >>>>>>>>>>>>>>>>>>>> + use_existing_stats = false; >>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>> + else >>>>>>>>>>>>>>>>>>>> + { >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> - heap_freetuple(classTup); >>>>>>>>>>>>>>>>>>>> + heap_freetuple(classTup); >>>>>>>>>>>>>>>>>>>> + /* The relid has already vacuumed, so we might be better to >>>>>>>>>>>>>>>>>>>> use exiting stats */ >>>>>>>>>>>>>>>>>>>> + use_existing_stats = true; >>>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> With that patch, the autovacuum process refreshes the stats in the >>>>>>>>>>>>>>>>>>>> next check if it finds out that the table still needs to be vacuumed. >>>>>>>>>>>>>>>>>>>> But I guess it's not necessarily true because the next table might be >>>>>>>>>>>>>>>>>>>> vacuumed already. So I think we might want to always use the existing >>>>>>>>>>>>>>>>>>>> for the first check. What do you think? >>>>>>>>>>>>>>>>>>> Thanks for your comment. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> If we assume the case where some workers vacuum on large tables >>>>>>>>>>>>>>>>>>> and a single worker vacuum on small tables, the processing >>>>>>>>>>>>>>>>>>> performance of the single worker will be slightly lower if the >>>>>>>>>>>>>>>>>>> existing statistics are checked every time. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> In fact, at first I tried to check the existing stats every time, >>>>>>>>>>>>>>>>>>> but the performance was slightly worse in cases with a small number of workers. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Do you have this benchmark result? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> (Checking the existing stats is lightweight , but at high frequency, >>>>>>>>>>>>>>>>>>> it affects processing performance.) >>>>>>>>>>>>>>>>>>> Therefore, at after refresh statistics, determine whether autovac >>>>>>>>>>>>>>>>>>> should use the existing statistics. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Yeah, since the test you used uses a lot of small tables, if there are >>>>>>>>>>>>>>>>>> a few workers, checking the existing stats is unlikely to return true >>>>>>>>>>>>>>>>>> (no need to vacuum). So the cost of existing stats check ends up being >>>>>>>>>>>>>>>>>> overhead. Not sure how slow always checking the existing stats was, >>>>>>>>>>>>>>>>>> but given that the shared memory based stats collector patch could >>>>>>>>>>>>>>>>>> improve the performance of refreshing stats, it might be better not to >>>>>>>>>>>>>>>>>> check the existing stats frequently like the patch does. Anyway, I >>>>>>>>>>>>>>>>>> think it’s better to evaluate the performance improvement with other >>>>>>>>>>>>>>>>>> cases too. >>>>>>>>>>>>>>>>> Yeah, I would like to see how much the performance changes in other cases. >>>>>>>>>>>>>>>>> In addition, if the shared-based-stats patch is applied, we won't need to reload >>>>>>>>>>>>>>>>> a huge stats file, so we will just have to check the stats on >>>>>>>>>>>>>>>>> shared-mem every time. >>>>>>>>>>>>>>>>> Perhaps the logic of table_recheck_autovac could be simpler. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> BTW, I found some typos in comments, so attache a fixed version. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The patch adds some duplicated codes into table_recheck_autovac(). >>>>>>>>>>>>>>>> It's better to make the common function performing them and make >>>>>>>>>>>>>>>> table_recheck_autovac() call that common function, to simplify the code. >>>>>>>>>>>>>>> Thanks for your comment. >>>>>>>>>>>>>>> Hmm.. I've cut out the duplicate part. >>>>>>>>>>>>>>> Attach the patch. >>>>>>>>>>>>>>> Could you confirm that it fits your expecting? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yes, thanks for updataing the patch! Here are another review comments. >>>>>>>>>>>>>> >>>>>>>>>>>>>> + shared = pgstat_fetch_stat_dbentry(InvalidOid); >>>>>>>>>>>>>> + dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId); >>>>>>>>>>>>>> >>>>>>>>>>>>>> When using the existing stats, ISTM that these are not necessary and >>>>>>>>>>>>>> we can reuse "shared" and "dbentry" obtained before. Right? >>>>>>>>>>>>> Yeah, but unless autovac_refresh_stats() is called, these functions >>>>>>>>>>>>> read the information from the >>>>>>>>>>>>> local hash table without re-read the stats file, so the process is very light. >>>>>>>>>>>>> Therefore, I think, it is better to keep the current logic to keep the >>>>>>>>>>>>> code simple. >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> + /* We might be better to refresh stats */ >>>>>>>>>>>>>> + use_existing_stats = false; >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think that we should add more comments about why it's better to >>>>>>>>>>>>>> refresh the stats in this case. >>>>>>>>>>>>>> >>>>>>>>>>>>>> + /* The relid has already vacuumed, so we might be better to use existing stats */ >>>>>>>>>>>>>> + use_existing_stats = true; >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think that we should add more comments about why it's better to >>>>>>>>>>>>>> reuse the stats in this case. >>>>>>>>>>>>> I added comments. >>>>>>>>>>>>> >>>>>>>>>>>>> Attache the patch. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thank you for updating the patch. Here are some small comments on the >>>>>>>>>>>> latest (v4) patch. >>>>>>>>>>>> >>>>>>>>>>>> + * So if the last time we checked a table that was already vacuumed after >>>>>>>>>>>> + * refres stats, check the current statistics before refreshing it. >>>>>>>>>>>> + */ >>>>>>>>>>>> >>>>>>>>>>>> s/refres/refresh/ >>>>>>>>>> Thanks! fixed. >>>>>>>>>> Attached the patch. >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ----- >>>>>>>>>>>> +/* Counter to determine if statistics should be refreshed */ >>>>>>>>>>>> +static bool use_existing_stats = false; >>>>>>>>>>>> + >>>>>>>>>>>> >>>>>>>>>>>> I think 'use_existing_stats' can be declared within table_recheck_autovac(). >>>>>>>>>>>> >>>>>>>>>>>> ----- >>>>>>>>>>>> While testing the performance, I realized that the statistics are >>>>>>>>>>>> reset every time vacuumed one table, leading to re-reading the stats >>>>>>>>>>>> file even if 'use_existing_stats' is true. Please refer that vacuum() >>>>>>>>>>>> eventually calls AtEOXact_PgStat() which calls to >>>>>>>>>>>> pgstat_clear_snapshot(). >>>>>>>>>>> >>>>>>>>>>> Good catch! >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> I believe that's why the performance of the >>>>>>>>>>>> method of always checking the existing stats wasn’t good as expected. >>>>>>>>>>>> So if we save the statistics somewhere and use it for rechecking, the >>>>>>>>>>>> results of the performance benchmark will differ between these two >>>>>>>>>>>> methods. >>>>>>>>>> Thanks for you checks. >>>>>>>>>> But, if a worker did vacuum(), that means this worker had determined >>>>>>>>>> need vacuum in the >>>>>>>>>> table_recheck_autovac(). So, use_existing_stats set to false, and next >>>>>>>>>> time, refresh stats. >>>>>>>>>> Therefore I think the current patch is fine, as we want to avoid >>>>>>>>>> unnecessary refreshing of >>>>>>>>>> statistics before the actual vacuum(), right? >>>>>>>>> >>>>>>>>> Yes, you're right. >>>>>>>>> >>>>>>>>> When I benchmarked the performance of the method of always checking >>>>>>>>> existing stats I edited your patch so that it sets use_existing_stats >>>>>>>>> = true even if the second check is false (i.g., vacuum is needed). >>>>>>>>> And the result I got was worse than expected especially in the case of >>>>>>>>> a few autovacuum workers. But it doesn't evaluate the performance of >>>>>>>>> that method rightly as the stats snapshot is cleared every time >>>>>>>>> vacuum. Given you had similar results, I guess you used a similar way >>>>>>>>> when evaluating it, is it right? If so, it’s better to fix this issue >>>>>>>>> and see how the performance benchmark results will differ. >>>>>>>>> >>>>>>>>> For example, the results of the test case with 10000 tables and 1 >>>>>>>>> autovacuum worker I reported before was: >>>>>>>>> >>>>>>>>> 10000 tables: >>>>>>>>> autovac_workers 1 : 158s,157s, 290s >>>>>>>>> >>>>>>>>> But after fixing that issue in the third method (always checking the >>>>>>>>> existing stats), the results are: >>>>>>>> >>>>>>>> Could you tell me how you fixed that issue? You copied the stats to >>>>>>>> somewhere as you suggested or skipped pgstat_clear_snapshot() as >>>>>>>> I suggested? >>>>>>> >>>>>>> I used the way you suggested in this quick test; skipped >>>>>>> pgstat_clear_snapshot(). >>>>>>> >>>>>>>> >>>>>>>> Kasahara-san seems not to like the latter idea because it might >>>>>>>> cause bad side effect. So we should use the former idea? >>>>>>> >>>>>>> Not sure. I'm also concerned about the side effect but I've not checked yet. >>>>>>> >>>>>>> Since probably there is no big difference between the two ways in >>>>>>> terms of performance I'm going to see how the performance benchmark >>>>>>> result will change first. >>>>>> >>>>>> I've tested performance improvement again. From the left the execution >>>>>> time of the current HEAD, Kasahara-san's patch, the method of always >>>>>> checking the existing stats (using approach suggested by Fujii-san), >>>>>> in seconds. >>>>>> >>>>>> 1000 tables: >>>>>> autovac_workers 1 : 13s, 13s, 13s >>>>>> autovac_workers 2 : 6s, 4s, 4s >>>>>> autovac_workers 3 : 3s, 4s, 3s >>>>>> autovac_workers 5 : 3s, 3s, 2s >>>>>> autovac_workers 10: 2s, 3s, 2s >>>>>> >>>>>> 5000 tables: >>>>>> autovac_workers 1 : 71s, 71s, 72s >>>>>> autovac_workers 2 : 37s, 32s, 32s >>>>>> autovac_workers 3 : 29s, 26s, 26s >>>>>> autovac_workers 5 : 20s, 19s, 18s >>>>>> autovac_workers 10: 13s, 8s, 8s >>>>>> >>>>>> 10000 tables: >>>>>> autovac_workers 1 : 158s,157s, 159s >>>>>> autovac_workers 2 : 80s, 53s, 78s >>>>>> autovac_workers 3 : 75s, 67s, 67s >>>>>> autovac_workers 5 : 61s, 42s, 42s >>>>>> autovac_workers 10: 69s, 26s, 25s >>>>>> >>>>>> 20000 tables: >>>>>> autovac_workers 1 : 379s, 380s, 389s >>>>>> autovac_workers 2 : 236s, 232s, 233s >>>>>> autovac_workers 3 : 222s, 181s, 182s >>>>>> autovac_workers 5 : 212s, 132s, 139s >>>>>> autovac_workers 10: 317s, 91s, 89s >>>>>> >>>>>> I don't see a big difference between Kasahara-san's patch and the >>>>>> method of always checking the existing stats. >>>>> >>>>> Thanks for doing the benchmark! >>>>> >>>>> This benchmark result makes me think that we don't need to tweak >>>>> AtEOXact_PgStat() and can use Kasahara-san approach. >>>>> That's good news :) >>>> >>>> Yeah, given that all autovaucum workers have the list of tables to >>>> vacuum in the same order in most cases, the assumption in >>>> Kasahara-san’s patch that if a worker needs to vacuum a table it’s >>>> unlikely that it will be able to skip the next table using the current >>>> snapshot of stats makes sense to me. >>>> >>>> One small comment on v6 patch: >>>> >>>> + /* When we decide to do vacuum or analyze, the existing stats cannot >>>> + * be reused in the next cycle because it's cleared at the end of vacuum >>>> + * or analyze (by AtEOXact_PgStat()). >>>> + */ >>>> + use_existing_stats = false; >>>> >>>> I think the comment should start on the second line (i.g., \n is >>>> needed after /*). >>> Oops, thanks. >>> Fixed. >> >> Thanks for updating the patch! >> >> I applied the following cosmetic changes to the patch. >> Attached is the updated version of the patch. >> Coud you review this version? > Thanks for tweaking the patch. > >> - Ran pgindent to fix some warnings that "git diff --check" >> reported on the patch. >> - Made the order of arguments consistent between >> recheck_relation_needs_vacanalyze and relation_needs_vacanalyze. >> - Renamed the variable use_existing_stats to reuse_stats for simplicity. >> - Added more comments. > I think it's no problem. > The patch passed makecheck, and I benchmarked "Anti wrap round VACUUM > case" (only 20000 tables) just in case. > > From the left the execution time of the current HEAD, v8 patch. > tables 20000: > autovac workers 1: 319sec, 315sec > autovac workers 2: 301sec, 190sec > autovac workers 3: 270sec, 133sec > autovac workers 5: 211sec, 86sec > autovac workers 10: 376sec, 68sec > > It's as expected. Thanks! >> Barring any objection, I'm thinking to commit this version. > +1 Pushed. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
pgsql-hackers by date: