Re: autovac issue with large number of tables - Mailing list pgsql-hackers
From | Kasahara Tatsuhito |
---|---|
Subject | Re: autovac issue with large number of tables |
Date | |
Msg-id | CAP0=ZV+9StnvXdFEyVPD7-ep4kCfqVcUBZzQb=ZmkbELk7RcGA@mail.gmail.com Whole thread Raw |
In response to | Re: autovac issue with large number of tables (Fujii Masao <masao.fujii@oss.nttdata.com>) |
List | pgsql-hackers |
Hi On Wed, Dec 2, 2020 at 3:33 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/12/02 12:53, Masahiko Sawada wrote: > > On Tue, Dec 1, 2020 at 5:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >> > >> On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>> > >>> > >>> > >>> On 2020/12/01 16:23, Masahiko Sawada wrote: > >>>> On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito > >>>> <kasahara.tatsuhito@gmail.com> wrote: > >>>>> > >>>>> Hi, > >>>>> > >>>>> On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 2020/11/30 10:43, Masahiko Sawada wrote: > >>>>>>> On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito > >>>>>>> <kasahara.tatsuhito@gmail.com> wrote: > >>>>>>>> > >>>>>>>> Hi, Thanks for you comments. > >>>>>>>> > >>>>>>>> On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On 2020/11/27 18:38, Kasahara Tatsuhito wrote: > >>>>>>>>>> Hi, > >>>>>>>>>> > >>>>>>>>>> On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On 2020/11/26 10:41, Kasahara Tatsuhito wrote: > >>>>>>>>>>>> On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito > >>>>>>>>>>>>> <kasahara.tatsuhito@gmail.com> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito > >>>>>>>>>>>>>>> <kasahara.tatsuhito@gmail.com> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito > >>>>>>>>>>>>>>>> <kasahara.tatsuhito@gmail.com> wrote: > >>>>>>>>>>>>>>>>>> I wonder if we could have table_recheck_autovac do two probes of the stats > >>>>>>>>>>>>>>>>>> data. First probe the existing stats data, and if it shows the table to > >>>>>>>>>>>>>>>>>> be already vacuumed, return immediately. If not, *then* force a stats > >>>>>>>>>>>>>>>>>> re-read, and check a second time. > >>>>>>>>>>>>>>>>> Does the above mean that the second and subsequent table_recheck_autovac() > >>>>>>>>>>>>>>>>> will be improved to first check using the previous refreshed statistics? > >>>>>>>>>>>>>>>>> I think that certainly works. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> If that's correct, I'll try to create a patch for the PoC > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I still don't know how to reproduce Jim's troubles, but I was able to reproduce > >>>>>>>>>>>>>>>> what was probably a very similar problem. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> This problem seems to be more likely to occur in cases where you have > >>>>>>>>>>>>>>>> a large number of tables, > >>>>>>>>>>>>>>>> i.e., a large amount of stats, and many small tables need VACUUM at > >>>>>>>>>>>>>>>> the same time. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> So I followed Tom's advice and created a patch for the PoC. > >>>>>>>>>>>>>>>> This patch will enable a flag in the table_recheck_autovac function to use > >>>>>>>>>>>>>>>> the existing stats next time if VACUUM (or ANALYZE) has already been done > >>>>>>>>>>>>>>>> by another worker on the check after the stats have been updated. > >>>>>>>>>>>>>>>> If the tables continue to require VACUUM after the refresh, then a refresh > >>>>>>>>>>>>>>>> will be required instead of using the existing statistics. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I did simple test with HEAD and HEAD + this PoC patch. > >>>>>>>>>>>>>>>> The tests were conducted in two cases. > >>>>>>>>>>>>>>>> (I changed few configurations. see attached scripts) > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> 1. Normal VACUUM case > >>>>>>>>>>>>>>>> - SET autovacuum = off > >>>>>>>>>>>>>>>> - CREATE tables with 100 rows > >>>>>>>>>>>>>>>> - DELETE 90 rows for each tables > >>>>>>>>>>>>>>>> - SET autovacuum = on and restart PostgreSQL > >>>>>>>>>>>>>>>> - Measure the time it takes for all tables to be VACUUMed > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> 2. Anti wrap round VACUUM case > >>>>>>>>>>>>>>>> - CREATE brank tables > >>>>>>>>>>>>>>>> - SELECT all of these tables (for generate stats) > >>>>>>>>>>>>>>>> - SET autovacuum_freeze_max_age to low values and restart PostgreSQL > >>>>>>>>>>>>>>>> - Consumes a lot of XIDs by using txid_curent() > >>>>>>>>>>>>>>>> - Measure the time it takes for all tables to be VACUUMed > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> For each test case, the following results were obtained by changing > >>>>>>>>>>>>>>>> autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10. > >>>>>>>>>>>>>>>> Also changing num of tables to 1000, 5000, 10000 and 20000. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable, > >>>>>>>>>>>>>>>> but I think it's enough to ask for a trend. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> =========================================================================== > >>>>>>>>>>>>>>>> [1.Normal VACUUM case] > >>>>>>>>>>>>>>>> tables:1000 > >>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> tables:5000 > >>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> tables:10000 > >>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> tables:20000 > >>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> [2.Anti wrap round VACUUM case] > >>>>>>>>>>>>>>>> tables:1000 > >>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> tables:5000 > >>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> tables:10000 > >>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> tables:20000 > >>>>>>>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec > >>>>>>>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec > >>>>>>>>>>>>>>>> =========================================================================== > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> The cases without patch, the scalability of the worker has decreased > >>>>>>>>>>>>>>>> as the number of tables has increased. > >>>>>>>>>>>>>>>> In fact, the more workers there are, the longer it takes to complete > >>>>>>>>>>>>>>>> VACUUM to all tables. > >>>>>>>>>>>>>>>> The cases with patch, it shows good scalability with respect to the > >>>>>>>>>>>>>>>> number of workers. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> It seems a good performance improvement even without the patch of > >>>>>>>>>>>>>>> shared memory based stats collector. > >>>>>>>>>>> > >>>>>>>>>>> Sounds great! > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Note that perf top results showed that hash_search_with_hash_value, > >>>>>>>>>>>>>>>> hash_seq_search and > >>>>>>>>>>>>>>>> pgstat_read_statsfiles are dominant during VACUUM in all patterns, > >>>>>>>>>>>>>>>> with or without the patch. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Therefore, there is still a need to find ways to optimize the reading > >>>>>>>>>>>>>>>> of large amounts of stats. > >>>>>>>>>>>>>>>> However, this patch is effective in its own right, and since there are > >>>>>>>>>>>>>>>> only a few parts to modify, > >>>>>>>>>>>>>>>> I think it should be able to be applied to current (preferably > >>>>>>>>>>>>>>>> pre-v13) PostgreSQL. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> +1 > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> + > >>>>>>>>>>>>>>> + /* We might be better to refresh stats */ > >>>>>>>>>>>>>>> + use_existing_stats = false; > >>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>> + else > >>>>>>>>>>>>>>> + { > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> - heap_freetuple(classTup); > >>>>>>>>>>>>>>> + heap_freetuple(classTup); > >>>>>>>>>>>>>>> + /* The relid has already vacuumed, so we might be better to > >>>>>>>>>>>>>>> use exiting stats */ > >>>>>>>>>>>>>>> + use_existing_stats = true; > >>>>>>>>>>>>>>> + } > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> With that patch, the autovacuum process refreshes the stats in the > >>>>>>>>>>>>>>> next check if it finds out that the table still needs to be vacuumed. > >>>>>>>>>>>>>>> But I guess it's not necessarily true because the next table might be > >>>>>>>>>>>>>>> vacuumed already. So I think we might want to always use the existing > >>>>>>>>>>>>>>> for the first check. What do you think? > >>>>>>>>>>>>>> Thanks for your comment. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> If we assume the case where some workers vacuum on large tables > >>>>>>>>>>>>>> and a single worker vacuum on small tables, the processing > >>>>>>>>>>>>>> performance of the single worker will be slightly lower if the > >>>>>>>>>>>>>> existing statistics are checked every time. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> In fact, at first I tried to check the existing stats every time, > >>>>>>>>>>>>>> but the performance was slightly worse in cases with a small number of workers. > >>>>>>>>>>> > >>>>>>>>>>> Do you have this benchmark result? > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>>>> (Checking the existing stats is lightweight , but at high frequency, > >>>>>>>>>>>>>> it affects processing performance.) > >>>>>>>>>>>>>> Therefore, at after refresh statistics, determine whether autovac > >>>>>>>>>>>>>> should use the existing statistics. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Yeah, since the test you used uses a lot of small tables, if there are > >>>>>>>>>>>>> a few workers, checking the existing stats is unlikely to return true > >>>>>>>>>>>>> (no need to vacuum). So the cost of existing stats check ends up being > >>>>>>>>>>>>> overhead. Not sure how slow always checking the existing stats was, > >>>>>>>>>>>>> but given that the shared memory based stats collector patch could > >>>>>>>>>>>>> improve the performance of refreshing stats, it might be better not to > >>>>>>>>>>>>> check the existing stats frequently like the patch does. Anyway, I > >>>>>>>>>>>>> think it’s better to evaluate the performance improvement with other > >>>>>>>>>>>>> cases too. > >>>>>>>>>>>> Yeah, I would like to see how much the performance changes in other cases. > >>>>>>>>>>>> In addition, if the shared-based-stats patch is applied, we won't need to reload > >>>>>>>>>>>> a huge stats file, so we will just have to check the stats on > >>>>>>>>>>>> shared-mem every time. > >>>>>>>>>>>> Perhaps the logic of table_recheck_autovac could be simpler. > >>>>>>>>>>>> > >>>>>>>>>>>>>> BTW, I found some typos in comments, so attache a fixed version. > >>>>>>>>>>> > >>>>>>>>>>> The patch adds some duplicated codes into table_recheck_autovac(). > >>>>>>>>>>> It's better to make the common function performing them and make > >>>>>>>>>>> table_recheck_autovac() call that common function, to simplify the code. > >>>>>>>>>> Thanks for your comment. > >>>>>>>>>> Hmm.. I've cut out the duplicate part. > >>>>>>>>>> Attach the patch. > >>>>>>>>>> Could you confirm that it fits your expecting? > >>>>>>>>> > >>>>>>>>> Yes, thanks for updataing the patch! Here are another review comments. > >>>>>>>>> > >>>>>>>>> + shared = pgstat_fetch_stat_dbentry(InvalidOid); > >>>>>>>>> + dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId); > >>>>>>>>> > >>>>>>>>> When using the existing stats, ISTM that these are not necessary and > >>>>>>>>> we can reuse "shared" and "dbentry" obtained before. Right? > >>>>>>>> Yeah, but unless autovac_refresh_stats() is called, these functions > >>>>>>>> read the information from the > >>>>>>>> local hash table without re-read the stats file, so the process is very light. > >>>>>>>> Therefore, I think, it is better to keep the current logic to keep the > >>>>>>>> code simple. > >>>>>>>> > >>>>>>>>> > >>>>>>>>> + /* We might be better to refresh stats */ > >>>>>>>>> + use_existing_stats = false; > >>>>>>>>> > >>>>>>>>> I think that we should add more comments about why it's better to > >>>>>>>>> refresh the stats in this case. > >>>>>>>>> > >>>>>>>>> + /* The relid has already vacuumed, so we might be better to use existing stats */ > >>>>>>>>> + use_existing_stats = true; > >>>>>>>>> > >>>>>>>>> I think that we should add more comments about why it's better to > >>>>>>>>> reuse the stats in this case. > >>>>>>>> I added comments. > >>>>>>>> > >>>>>>>> Attache the patch. > >>>>>>>> > >>>>>>> > >>>>>>> Thank you for updating the patch. Here are some small comments on the > >>>>>>> latest (v4) patch. > >>>>>>> > >>>>>>> + * So if the last time we checked a table that was already vacuumed after > >>>>>>> + * refres stats, check the current statistics before refreshing it. > >>>>>>> + */ > >>>>>>> > >>>>>>> s/refres/refresh/ > >>>>> Thanks! fixed. > >>>>> Attached the patch. > >>>>> > >>>>>>> > >>>>>>> ----- > >>>>>>> +/* Counter to determine if statistics should be refreshed */ > >>>>>>> +static bool use_existing_stats = false; > >>>>>>> + > >>>>>>> > >>>>>>> I think 'use_existing_stats' can be declared within table_recheck_autovac(). > >>>>>>> > >>>>>>> ----- > >>>>>>> While testing the performance, I realized that the statistics are > >>>>>>> reset every time vacuumed one table, leading to re-reading the stats > >>>>>>> file even if 'use_existing_stats' is true. Please refer that vacuum() > >>>>>>> eventually calls AtEOXact_PgStat() which calls to > >>>>>>> pgstat_clear_snapshot(). > >>>>>> > >>>>>> Good catch! > >>>>>> > >>>>>> > >>>>>>> I believe that's why the performance of the > >>>>>>> method of always checking the existing stats wasn’t good as expected. > >>>>>>> So if we save the statistics somewhere and use it for rechecking, the > >>>>>>> results of the performance benchmark will differ between these two > >>>>>>> methods. > >>>>> Thanks for you checks. > >>>>> But, if a worker did vacuum(), that means this worker had determined > >>>>> need vacuum in the > >>>>> table_recheck_autovac(). So, use_existing_stats set to false, and next > >>>>> time, refresh stats. > >>>>> Therefore I think the current patch is fine, as we want to avoid > >>>>> unnecessary refreshing of > >>>>> statistics before the actual vacuum(), right? > >>>> > >>>> Yes, you're right. > >>>> > >>>> When I benchmarked the performance of the method of always checking > >>>> existing stats I edited your patch so that it sets use_existing_stats > >>>> = true even if the second check is false (i.g., vacuum is needed). > >>>> And the result I got was worse than expected especially in the case of > >>>> a few autovacuum workers. But it doesn't evaluate the performance of > >>>> that method rightly as the stats snapshot is cleared every time > >>>> vacuum. Given you had similar results, I guess you used a similar way > >>>> when evaluating it, is it right? If so, it’s better to fix this issue > >>>> and see how the performance benchmark results will differ. > >>>> > >>>> For example, the results of the test case with 10000 tables and 1 > >>>> autovacuum worker I reported before was: > >>>> > >>>> 10000 tables: > >>>> autovac_workers 1 : 158s,157s, 290s > >>>> > >>>> But after fixing that issue in the third method (always checking the > >>>> existing stats), the results are: > >>> > >>> Could you tell me how you fixed that issue? You copied the stats to > >>> somewhere as you suggested or skipped pgstat_clear_snapshot() as > >>> I suggested? > >> > >> I used the way you suggested in this quick test; skipped > >> pgstat_clear_snapshot(). > >> > >>> > >>> Kasahara-san seems not to like the latter idea because it might > >>> cause bad side effect. So we should use the former idea? > >> > >> Not sure. I'm also concerned about the side effect but I've not checked yet. > >> > >> Since probably there is no big difference between the two ways in > >> terms of performance I'm going to see how the performance benchmark > >> result will change first. > > > > I've tested performance improvement again. From the left the execution > > time of the current HEAD, Kasahara-san's patch, the method of always > > checking the existing stats (using approach suggested by Fujii-san), > > in seconds. > > > > 1000 tables: > > autovac_workers 1 : 13s, 13s, 13s > > autovac_workers 2 : 6s, 4s, 4s > > autovac_workers 3 : 3s, 4s, 3s > > autovac_workers 5 : 3s, 3s, 2s > > autovac_workers 10: 2s, 3s, 2s > > > > 5000 tables: > > autovac_workers 1 : 71s, 71s, 72s > > autovac_workers 2 : 37s, 32s, 32s > > autovac_workers 3 : 29s, 26s, 26s > > autovac_workers 5 : 20s, 19s, 18s > > autovac_workers 10: 13s, 8s, 8s > > > > 10000 tables: > > autovac_workers 1 : 158s,157s, 159s > > autovac_workers 2 : 80s, 53s, 78s > > autovac_workers 3 : 75s, 67s, 67s > > autovac_workers 5 : 61s, 42s, 42s > > autovac_workers 10: 69s, 26s, 25s > > > > 20000 tables: > > autovac_workers 1 : 379s, 380s, 389s > > autovac_workers 2 : 236s, 232s, 233s > > autovac_workers 3 : 222s, 181s, 182s > > autovac_workers 5 : 212s, 132s, 139s > > autovac_workers 10: 317s, 91s, 89s > > > > I don't see a big difference between Kasahara-san's patch and the > > method of always checking the existing stats. Thanks! > Thanks for doing the benchmark! > > This benchmark result makes me think that we don't need to tweak > AtEOXact_PgStat() and can use Kasahara-san approach. > That's good news :) > > + /* > + * The relid had not yet been vacuumed. That means, it is unlikely that the > + * stats that this worker currently has are updated by other worker's. > + * So we might be better to refresh the stats in the next this recheck. > + */ > + use_existing_stats = false; > > I think that this comment should be changed to something like > the following. Thought? I think your comment is more reasonable. I replaced the comments. > > When we decide to do vacuum or analyze, the existing stats cannot > be reused in the next cycle because it's cleared at the end of vacuum > or analyze (by AtEOXact_PgStat()). > > + /* > + * The relid had already vacuumed. That means, that for the stats that this > + * worker currently has, the info of tables that this worker will process may > + * have been updated by other workers with information that has already been > + * vacuumed or analyzed. > + * So we might be better to reuse the existing stats in the next this recheck. > + */ > + use_existing_stats = true; > > Maybe it's better to change this comment to something like the following? I replaced the comments. > If neither vacuum nor analyze is necessary, the existing stats is > not cleared and can be reused in the next cycle. > > + if (use_existing_stats) > + { > + recheck_relation_needs_vacanalyze(relid, classForm, avopts, > + effective_multixact_freeze_max_age, > + &dovacuum, &doanalyze, &wraparound); > > Personally I'd like to add the assertion test checking "pgStatDBHash != NULL" > here, to guarantee that there is the existing stats to reuse when > use_existing_stats==true. Because if the future changes of autovacuum > code will break that assumption, it's not easy to detect that breakage > without that assertion test. Thought? I think, it's nice to have. But if do so, we have to add new function to pgstat.c for check pgStatDBHash is null or not. I'm not sure it's a reasonable change. And, if pgstatDBHash is NULL here, it is not a critical issue, so foregoing the addition of the Assert for now. > + shared = pgstat_fetch_stat_dbentry(InvalidOid); > + dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId); > > If classForm->relisshared is true, only the former needs to be executed. > Otherwise, only the latter needs to be executed. Right? Right. I modified that check classForm->relisshared to execute only one of them. Attached the patch. Best regards, > Regards, > > -- > Fujii Masao > Advanced Computing Technology Center > Research and Development Headquarters > NTT DATA CORPORATION -- Tatsuhito Kasahara kasahara.tatsuhito _at_ gmail.com
Attachment
pgsql-hackers by date: