Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune() - Mailing list pgsql-bugs
From | Alena Rybakina |
---|---|
Subject | Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune() |
Date | |
Msg-id | 0a994343-c552-4535-a9cf-b4caa4edc1e8@yandex.ru Whole thread Raw |
In response to | Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune() (Alena Rybakina <lena.ribackina@yandex.ru>) |
Responses |
Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()
|
List | pgsql-bugs |
On 02.05.2024 19:52, Peter Geoghegan wrote:I reproduced this test in the master branch as well, but used a more complex test for it: I added 700 tuples to the table, deleted half of the table, and then started vacuum. I expected to get only 350 live tuples and 0 dead and deleted tuples, but after 800 iterations I got 350 dead tuples and 350 live tuples: n_dead_tup|n_live_tup|n_tup_delHi! I also investigated this issue and reproduced it using this test added to the isolated tests, where I added 2 tuples, deleted them and ran vacuum and printed the tuple_deleted and dead_tuples statistics (I attached test c to this email as a patch). Within 400 iterations or more, I got the results:On Sat, Apr 27, 2024 at 10:38 AM Melanie Plageman <melanieplageman@gmail.com> wrote:In 17, we don't ever get a new HTSV_Result, so if the tuple is not removed, it would be because HeapTupleSatisfiesVacuumHorizon() returned HEAPTUPLE_RECENTLY_DEAD and, if GlobalVisTestIsRemovableXid() was called, dead_after did not precede GlobalVisState->maybe_needed. This tuple, during this vacuum of the relation, would never be determined to be HEAPTUPLE_DEAD or it would have been removed.That makes sense.It will always be HEAPTUPLE_RECENTLY_DEAD in 17 and in <= 16, if HeapTupleSatisfiesVacuum() returns HEAPTUPLE_DEAD, we wouldn't call heap_prepare_freeze_tuple() because of the retry loop.The retry loop exists precisely because heap_prepare_freeze_tuple() isn't prepared to deal with HEAPTUPLE_DEAD tuples. So I agree that that won't be allowed to happen on versions that have the retry loop (14 - 16).So, it can't happen in back branches. Let's just address 17. Help me understand how this can happen in 17.Just to be clear, I never said that it was possible in 17. If I somehow implied it, then I didn't mean to.n_dead_tup|n_live_tup|n_tup_del ----------------+------------+------------- 0| 0| 0 (1 row)
After 400 or more running cycles, I felt the differences, as shown earlier:
n_dead_tup|n_live_tup|n_tup_del
----------+----------+---------
- 0| 0| 0
+ 2| 0| 0
(1 row)
I debugged and found that the test produces results with 0 dead tuples if GlobalVisTempRels.maybe_needed is less than the x_max of the tuple. In the code, this condition works in heap_prune_satisfies_vacuum:else if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
{
res = HEAPTUPLE_DEAD;
} But when GlobalVisTempRels.maybe_needed is equal to the x_max xid of the tuple, vacuum does not touch this tuple, because the heap_prune_satisfies_vacuum function returns the status of the RECENTLY_DEAD tuple.Unfortunately, I have not found any explanation why GlobalVisTempRels.maybe_needed does not change after 400 iterations or more. I'm still studying it. Perhaps this information will help you.
I reproduced the problem on REL_16_STABLE.
---------------+-------------+-------------
- 0| 350| 0
+ 350| 350| 0
(1 row)
I have added other steps in the test, but so far I have not seen any falls there or have not reached them.
Just in case, I ran the test with this bash command:
for i in `seq 2000`;do echo "ITER $i"; make -s installcheck -C src/test/isolation/ || break;done
-- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
pgsql-bugs by date: