It's quite common to load a lot of data, and then do some reads for a while (setting hint bits and flushing them to disk), and then do a VACUUM a while later, setting PD_ALL_VISIBLE and writing all of the pages again. Also, if I remember correctly, Robert went to significant effort when making the VM crash-safe to keep the PD_ALL_VISIBLE and VM bits consistent. Maybe this was all discussed before?
All of these hint bits will have a bit more of a performance impact after checksums are introduced (for those that use them in conjunction with large data loads), so I'm looking for some simple ways to mitigate those effects. What kind of worst-case tests could I construct to see if there are worrying performance effects to removing these hint bits?
Regards, Jeff Davis
I completely agree.In fact, that is the problem that we are trying to solve in our patch(https://commitfest.postgresql.org/action/patch_view?id=991). Essentially, we are trying to mitigate the expense of maintaining hint bits in the cases when the user loads a lot of data, does some operations such as SELECT, and deletes them all.We maintain a cache that can be used to fetch the commit status of XMAX or XMIN instead of hint bits.As the cache is single frame, it has no issues in replacement algorithm.Cache lookup is pretty cheap.
I agree with the removal of PD_ALL_VISIBLE.AFAIK(pardon me if I am wrong, I have been trying to research while following this thread), PD_AL_VISIBLE was really useful when VM bits were not really safe, and crashes could lead to redo setting the bit on the heap pages.