That clearly explains the problem. But this got me thinking: what if we do both index and heap optimization at the same time?
Meaning that the newly move heap tuple which is used to compact/defragment heap pages would be followed by moving the index (creating and then deleting) a new index tuple at the right place in the index data files (the one that had its dead tuples removed and internally defragmented, aka vacuumed). Deleting the old index could be done immediately after moving the heap tuple. I think that this can both solve the bloating problem and make sure that both the table and index heaps are in optimum shape, all of this being done lazily to make sure that these operations would only be done when the servers are not overwhelmed (or just using whatever logic our lazy vacuuming uses). What do you think?
On Sun, 21 Jul 2024 at 04:00, Ahmed Yarub Hani Al Nuaimi <ahmedyarubhani@gmail.com> wrote: > 2- Can you point me to a resource explaining why this might lead to index bloating?
No resource links, but if you move a tuple to another page then you must also adjust the index. If you have no exclusive lock on the table, then you must assume older transactions still need the old tuple version, so you need to create another index entry rather than re-pointing the existing index entry's ctid to the new tuple version. It's not hard to imagine that would cause the index to become larger if you had to move some decent portion of the tuples to other pages.
FWIW, I think it would be good if we had some easier way to compact tables without blocking concurrent users. My primary interest in TID Range Scans was to allow easier identification of tuples near the end of the heap that could be manually UPDATEd after a vacuum to allow the heap to be shrunk during the next vacuum.