Re: xid wraparound danger due to INDEX_CLEANUP false - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: xid wraparound danger due to INDEX_CLEANUP false |
Date | |
Msg-id | CAH2-Wz=+txMTk8TZ5+ipuvpiQfDVrE4Nt6hx7MjDuA66rE5Brw@mail.gmail.com Whole thread Raw |
In response to | Re: xid wraparound danger due to INDEX_CLEANUP false (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: xid wraparound danger due to INDEX_CLEANUP false
|
List | pgsql-hackers |
On Fri, Nov 20, 2020 at 3:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > I had missed your bottom-up index deletion patch but it's a promising > improvement. With that patch, the number of dead tuples in individual > indexes may differ. So it's important that we make index vacuuming a > per-index thing. Right, but it's important to remember that even bottom-up index deletion isn't really special in theory. Bottom-up index deletion is "just" a reliable version of the existing LP_DEAD index deletion thing (which has been around since Postgres 8.2). In theory it doesn't change the fundamental nature of the problem. In practice it does, because it makes it very obvious to pgsql-hackers that indexes on the same table can have very different needs from VACUUM. And the actual differences we see are much bigger now. Even still, the fact that you had certain big differences across indexes on the same table is not a new thing. (Actually, you can even see this on the master branch in Victor's bottom-up deletion benchmark, where the primary key index actually doesn't grow on the master branch, even after 8 hours.) The bottom-up index deletion patch (and the enhancements we're talking about here, for VACUUM itself) are based on "the generational hypothesis" that underlies generational garbage collection. The philosophy is the same. See: https://plumbr.io/handbook/garbage-collection-in-java/generational-hypothesis In theory, "most garbage comes from new objects" is "just" an empirical observation, that may or may not be true with each workload/Java program/whatever. In practice it is important enough to be a big part of how every modern garbage collector works -- it's almost always true, and even when it isn't true it doesn't actually hurt to make the assumption that it is true and then be wrong. I believe that we have to take a holistic view of the problem to make real progress. Andres said something similar in a recent blog post: https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/improving-postgres-connection-scalability-snapshots/ba-p/1806462#interlude-removing-the-need-for-recentglobalxminhorizon "In most workloads the majority of accesses are to live tuples, and when encountering non-live tuple versions they are either very old, or very new." (This was just a coincidence, but it was good to see that he made the same observation.) > Given that patch, it seems to me that it would be better to ask > individual index AM before calling to bulkdelete about the needs of > bulkdelete. That is, passing VACUUM options and the number of > collected dead tuples etc. to index AM, we ask index AM via a new > index AM API whether it wants to do bulkdelete or not. We call > bulkdelete for only indexes that answered 'yes'. If we got 'no' from > any one of the indexes, we cannot have a second heap pass. > INDEX_CLEANUP is not enforceable. When INDEX_CLEANUP is set to false, > we expect index AMs to return 'no' unless they have a special reason > for the needs of bulkdelete. I don't have a very detailed idea of the interface or anything. There are a few questions that naturally present themselves, that I don't have good answers to right now. Obviously vacuumlazy.c will only treat this feedback from each index as an advisory thing. So what happens when 50% of the indexes say yes and 50% say no? This is a subproblem that must be solved as part of this work. Ideally it will be solved by you. :-) > One possible benefit of this idea even without bottom-up index > deleteion patch would be something like > vacuum_index_cleanup_scale_factor for bulkdelete. For example, in the > case where the amount of dead tuple is slightly larger than > maitenance_work_mem the second time calling to bulkdelete will be > called with a small amount of dead tuples, which is less efficient. If > an index AM is able to determine not to do bulkdelete by comparing the > number of dead tuples to a threshold, it can avoid such bulkdelete > calling. I agree. Actually, I thought the same thing myself, even before I realized that bottom-up index deletion was possible. > Also, as a future extension, once we have retail index deletion > feature, we might be able to make that API return a ternary value: > 'no', 'do_bulkdelete', ‘do_indexscandelete, so that index AM can > choose the appropriate method of index deletion based on the > statistics. I agree again! We may eventually be able to make autovacuum run very frequently against each table in many important cases, with each VACUUM taking very little wall-clock time. We don't have to change the fundamental design to fix most of the current problems. I suspect that the "top down" nature of VACUUM is sometimes helpful. We just need to compensate when this design is insufficient. Getting the "best of both worlds" is possible. > But for making index vacuuming per-index thing, we need to deal with > the problem that we cannot know which indexes of the table still has > index tuples pointing to the collected dead tuple. For example, if an > index always says 'no' (not need bulkdelete therefore we need to keep > dead line pointers), the collected dead tuples might already be marked > as LP_DEAD and there might already not be index tuples pointing to > them in other index AMs. In that case we don't want to call to > bulkdelete for other indexes. Probably we can have additional > statistics like the number of dead tuples in individual indexes so > that they can determine the needs of bulkdelete. But it’s not a > comprehensive solution. Right. Maybe we don't ask the index AMs for discrete yes/no answers. Maybe we can ask them for a continuous answer, such as a value between 0.0 and 1.0 that represents the urgency/bloat, or something like that. And so the final yes/no answer that really does have to be made for the table as a whole (does VACUUM do a second pass over the heap to make LP_DEAD items into LP_UNUSED items?) can at least consider the worst case for each index. And maybe the average case, too. (I am just making random suggestions to stimulate discussion. Don't take these specific suggestions about the am interface too seriously.) > > If you are able to pursue this project, in whole or in part, I would > > definitely be supportive of that. I may be able to commit it. I think > > that this project has many benefits, not just one or two. It seems > > strategic. > > Thanks, that’s really helpful. I’m going to work on that. Since things > became complicated by these two features that I proposed I’ll do my > best to sort out the problem and improve it in PG14. Excellent! Thank you. -- Peter Geoghegan
pgsql-hackers by date: