Re: recovering from "found xmin ... from before relfrozenxid ..." - Mailing list pgsql-hackers
From | Ashutosh Sharma |
---|---|
Subject | Re: recovering from "found xmin ... from before relfrozenxid ..." |
Date | |
Msg-id | CAE9k0PmeC0kM-FoA4P5XFWcgwtQToFk07RaTCYeiYZLdtYrXHg@mail.gmail.com Whole thread Raw |
In response to | Re: recovering from "found xmin ... from before relfrozenxid ..." (Ashutosh Sharma <ashu.coek88@gmail.com>) |
List | pgsql-hackers |
Attached is the new version of patch that addresses the comments from Asim Praveen and Masahiko-san. It also improves the documentation to some extent. On Tue, Aug 18, 2020 at 1:46 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > Hello Masahiko-san, > > I've spent some more time trying to understand the code in > lazy_scan_heap function to know under what all circumstances a VACUUM > can fail with "found xmin ... before relfrozenxid ..." error for a > tuple whose xmin is behind relfrozenxid. Here are my observations: > > 1) It can fail with this error for a live tuple > > OR, > > 2) It can also fail with this error if a tuple (that went through > update) is marked as HEAP_HOT_UPDATED or HEAP_ONLY_TUPLE. > > OR, > > 3) If there are any concurrent transactions, then the tuple might be > marked as HEAPTUPLE_INSERT_IN_PROGRESS or HEAPTUPLE_DELETE_IN_PROGRESS > or HEAPTUPLE_RECENTLY_DEAD in which case also VACUUM can fail with > this error. > > Now, AFAIU, as we will be dealing with a damaged table, the chances of > point #3 being the cause of this error looks impossible in our case > because I don't think we will be doing anything in parallel when > performing surgery on a damaged table, in fact we shouldn't be doing > any such things. However, it is quite possible that reason #2 could > cause VACUUM to fail with this sort of error, but, as we are already > skipping redirected item pointers in heap_force_common(), I think, we > would never be marking HEAP_HOT_UPDATED tuple as frozen and I don't > see any problem in marking HEAP_ONLY_TUPLE as frozen. So, probably, we > may not need to handle point #2 as well. > > Further, I also don't see VACUUM reporting this error for a tuple that > has been moved from one partition to another. So, I think we might not > need to do any special handling for a tuple that got updated and its > new version was moved to another partition. > > If you feel I am missing something here, please correct me. Thank you. > > Moreover, while I was exploring on above, I noticed that in > lazy_scan_heap(), before we call HeapTupleSatisfiesVacuum() we check > for a redirected item pointers and if any redirected item pointer is > detected we do not call HeapTupleSatisfiesVacuum(). So, not sure how > HeapTupleSatisfiesVacuum would ever return a dead tuple that is marked > with HEAP_HOT_UPDATED. I am referring to the following code in > lazy_scan_heap(). > > for (offnum = FirstOffsetNumber; > offnum <= maxoff; > offnum = OffsetNumberNext(offnum)) > { > ItemId itemid; > > itemid = PageGetItemId(page, offnum); > > ............. > ............. > > > /* Redirect items mustn't be touched */ <-- this check > would bypass the redirected item pointers from being checked for > HeapTupleSatisfiesVacuum. > if (ItemIdIsRedirected(itemid)) > { > hastup = true; /* this page won't be truncatable */ > continue; > } > > .............. > .............. > > switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf)) > { > case HEAPTUPLE_DEAD: > > if (HeapTupleIsHotUpdated(&tuple) || > HeapTupleIsHeapOnly(&tuple) || > params->index_cleanup == VACOPT_TERNARY_DISABLED) > nkeep += 1; > else > tupgone = true; /* we can delete the tuple */ > .............. > .............. > } > > > So, the point is, would HeapTupleIsHotUpdated(&tuple) ever be true? > > -- > With Regards, > Ashutosh Sharma > EnterpriseDB:http://www.enterprisedb.com > > On Mon, Aug 17, 2020 at 11:35 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > > > Hello Masahiko-san, > > > > Thanks for the review. Please check the comments inline below: > > > > On Fri, Aug 14, 2020 at 10:07 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > Thank you for updating the patch! Here are my comments on v5 patch: > > > > > > --- a/contrib/Makefile > > > +++ b/contrib/Makefile > > > @@ -35,6 +35,7 @@ SUBDIRS = \ > > > pg_standby \ > > > pg_stat_statements \ > > > pg_trgm \ > > > + pg_surgery \ > > > pgcrypto \ > > > > > > I guess we use alphabetical order here. So pg_surgery should be placed > > > before pg_trgm. > > > > > > > Okay, will take care of this in the next version of patch. > > > > > --- > > > + if (heap_force_opt == HEAP_FORCE_KILL) > > > + ItemIdSetDead(itemid); > > > > > > I think that if the page is an all-visible page, we should clear an > > > all-visible bit on the visibility map corresponding to the page and > > > PD_ALL_VISIBLE on the page header. Otherwise, index only scan would > > > return the wrong results. > > > > > > > I think we should let VACUUM do that. Please note that this module is > > intended to be used only on a damaged relation and should only be > > operated on damaged tuples of such relations. And the execution of any > > of the functions provided by this module on a damaged relation must be > > followed by VACUUM with DISABLE_PAGE_SKIPPING option on that relation. > > This is necessary to bring back a damaged relation to the sane state > > once a surgery is performed on it. I will try to add this note in the > > documentation for this module. > > > > > --- > > > + /* > > > + * We do not mark the buffer dirty or do WAL logging for unmodifed > > > + * pages. > > > + */ > > > + if (!did_modify_page) > > > + goto skip_wal; > > > + > > > + /* Mark buffer dirty before we write WAL. */ > > > + MarkBufferDirty(buf); > > > + > > > + /* XLOG stuff */ > > > + if (RelationNeedsWAL(rel)) > > > + log_newpage_buffer(buf, true); > > > + > > > +skip_wal: > > > + END_CRIT_SECTION(); > > > + > > > > > > s/unmodifed/unmodified/ > > > > > > > okay, will fix this typo. > > > > > Do we really need to use goto? I think we can modify it like follows: > > > > > > if (did_modity_page) > > > { > > > /* Mark buffer dirty before we write WAL. */ > > > MarkBufferDirty(buf); > > > > > > /* XLOG stuff */ > > > if (RelationNeedsWAL(rel)) > > > log_newpage_buffer(buf, true); > > > } > > > > > > END_CRIT_SECTION(); > > > > > > > No, we don't need it. We can achieve the same by checking the status > > of did_modify_page flag as you suggested. I will do this change in the > > next version. > > > > > --- > > > pg_force_freeze() can revival a tuple that is already deleted but not > > > vacuumed yet. Therefore, the user might need to reindex indexes after > > > using that function. For instance, with the following script, the last > > > two queries: index scan and seq scan, will return different results. > > > > > > set enable_seqscan to off; > > > set enable_bitmapscan to off; > > > set enable_indexonlyscan to off; > > > create table tbl (a int primary key); > > > insert into tbl values (1); > > > > > > update tbl set a = a + 100 where a = 1; > > > > > > explain analyze select * from tbl where a < 200; > > > > > > -- revive deleted tuple on heap > > > select heap_force_freeze('tbl', array['(0,1)'::tid]); > > > > > > -- index scan returns 2 tuples > > > explain analyze select * from tbl where a < 200; > > > > > > -- seq scan returns 1 tuple > > > set enable_seqscan to on; > > > explain analyze select * from tbl; > > > > > > > I am not sure if this is the right use-case of pg_force_freeze > > function. I think we should only be running pg_force_freeze function > > on a tuple for which VACUUM reports "found xmin ABC from before > > relfrozenxid PQR" sort of error otherwise it might worsen the things > > instead of making it better. Now, the question is - can VACUUM report > > this type of error for a deleted tuple or it would only report it for > > a live tuple? AFAIU this won't be reported for the deleted tuples > > because VACUUM wouldn't consider freezing a tuple that has been > > deleted. > > > > > Also, if a tuple updated and moved to another partition is revived by > > > heap_force_freeze(), its ctid still has special values: > > > MovedPartitionsOffsetNumber and MovedPartitionsBlockNumber. I don't > > > see a problem yet caused by a visible tuple having the special ctid > > > value, but it might be worth considering either to reset ctid value as > > > well or to not freezing already-deleted tuple. > > > > > > > For this as well, the answer remains the same as above. > > > > -- > > With Regards, > > Ashutosh Sharma > > EnterpriseDB:http://www.enterprisedb.com
Attachment
pgsql-hackers by date: