New IndexAM API controlling index vacuum strategies - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | New IndexAM API controlling index vacuum strategies |
Date | |
Msg-id | CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Whole thread Raw |
Responses |
Re: New IndexAM API controlling index vacuum strategies
|
List | pgsql-hackers |
Hi all, I've started this separate thread from [1] for discussing the general API design of index vacuum. Summary: * Call ambulkdelete and amvacuumcleanup even when INDEX_CLEANUP is false, and leave it to the index AM whether or not skip them. * Add a new index AM API amvacuumstrategy(), asking the index AM the strategy before calling to ambulkdelete. * Whether or not remove garbage tuples from heap depends on multiple factors including INDEX_CLEANUP option and the answers of amvacuumstrategy() for each index AM. The first point is to fix the inappropriate behavior discussed on the thread[1]. The second and third points are to introduce a general framework for future extensibility. User-visible behavior is not changed by this change. The new index AM API, amvacuumstrategy(), which is called before bulkdelete() for each index and asks the index bulk-deletion strategy. On this API, lazy vacuum asks, "Hey index X, I collected garbage heap tuples during heap scanning, how urgent is vacuuming for you?", and the index answers either "it's urgent" when it wants to do bulk-deletion or "it's not urgent, I can skip it". The point of this proposal is to isolate heap vacuum and index vacuum for each index so that we can employ different strategies for each index. Lazy vacuum can decide whether or not to do heap clean based on the answers from the indexes. By default, if all indexes answer 'yes' (meaning it will do bulkdelete()), lazy vacuum can do heap clean. On the other hand, if even one index answers 'no' (meaning it will not do bulkdelete()), lazy vacuum doesn't the heap clean. Lazy vacuum would also be able to require indexes to do bulkdelete() for some reason such as specyfing INDEX_CLEANUP option by the user. It’s something like saying "Hey index X, you answered not to do bulkdelete() but since heap clean is necessary for me please don't skip bulkdelete()". Currently, if INDEX_CLEANUP option is not set (i.g. VACOPT_TERNARY_DEFAULT in the code), it's treated as true and will do heap clean. But with this patch we use the default as a neutral state ('smart' mode). This neutral state could be "on" and "off" depending on several factors including the answers of amvacuumstrategy(), the table status, and user's request. In this context, specifying INDEX_CLEANUP would mean making the neutral state "on" or "off" by user's request. The table status that could influence the decision could concretely be, for instance: * Removing LP_DEAD accumulation due to skipping bulkdelete() for a long time. * Making pages all-visible for index-only scan. Also there are potential enhancements using this API: * If bottom-up index deletion feature[2] is introduced, individual indexes could be a different situation in terms of dead tuple accumulation; some indexes on the table can delete its garbage index tuples without bulkdelete(). A problem will appear that doing bulkdelete() for such indexes would not be efficient. This problem is solved by this proposal because we can do bulkdelete() for a subset of indexes on the table. * If retail index deletion feature[3] is introduced, we can make the return value of bulkvacuumstrategy() a ternary value: "do_bulkdelete", "do_indexscandelete", and "no". * We probably can introduce a threshold of the number of dead tuples to control whether or not to do index tuple bulk-deletion (like bulkdelete() version of vacuum_cleanup_index_scale_factor). In the case where the amount of dead tuples is slightly larger than maitenance_work_mem the second time calling to bulkdelete will be called with a small number of dead tuples, which is inefficient. This problem is also solved by this proposal by allowing a subset of indexes to skip bulkdelete() if the number of dead tuple doesn't exceed the threshold. I’ve attached the PoC patch for the above idea. By default, since lazy vacuum choose the vacuum bulkdelete strategy based on answers of amvacuumstrategy() so it can be either true or false ( although it’s always true in the currene patch). But for amvacuumcleanup() there is no the neutral state, lazy vacuum treats the default as true. Comment and feedback are very welcome. Regards, [1] https://www.postgresql.org/message-id/20200415233848.saqp72pcjv2y6ryi%40alap3.anarazel.de [2] https://www.postgresql.org/message-id/CAH2-Wzm%2BmaE3apHB8NOtmM%3Dp-DO65j2V5GzAWCOEEuy3JZgb2g%40mail.gmail.com [3] https://www.postgresql.org/message-id/425db134-8bba-005c-b59d-56e50de3b41e%40postgrespro.ru -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
Attachment
pgsql-hackers by date: