Re: recovering from "found xmin ... from before relfrozenxid ..." - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: recovering from "found xmin ... from before relfrozenxid ..." |
Date | |
Msg-id | CA+fd4k5jp5aGH6GGvcyO41aSXVo4AVPZo2q9FDx8c7BDWU1J_A@mail.gmail.com Whole thread Raw |
In response to | Re: recovering from "found xmin ... from before relfrozenxid ..." (Ashutosh Sharma <ashu.coek88@gmail.com>) |
Responses |
Re: recovering from "found xmin ... from before relfrozenxid ..."
Re: recovering from "found xmin ... from before relfrozenxid ..." |
List | pgsql-hackers |
On Wed, 5 Aug 2020 at 22:42, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > Hi Robert, > > Thanks for the review. Please find my comments inline: > > On Sat, Aug 1, 2020 at 12:18 AM Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Fri, Jul 31, 2020 at 8:52 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > > Attached is the new version of patch that addresses the comments from Andrey and Beena. > > > > +PGFILEDESC = "pg_surgery - perform surgery on the damaged heap table" > > > > the -> a > > > > I also suggest: heap table -> relation, because we might want to > > extend this to other cases later. > > > > Corrected. > > > +-- toast table. > > +begin; > > +create table ttab(a text); > > +insert into ttab select string_agg(chr(floor(random() * 26)::int + > > 65), '') from generate_series(1,10000); > > +select * from ttab where xmin = 2; > > + a > > +--- > > +(0 rows) > > + > > +select heap_force_freeze('ttab'::regclass, ARRAY['(0, 1)']::tid[]); > > + heap_force_freeze > > +------------------- > > + > > +(1 row) > > + > > > > I don't understand the point of this. You're not testing the function > > on the TOAST table; you're testing it on the main table when there > > happens to be a TOAST table that is probably getting used for > > something. But that's not really relevant to what is being tested > > here, so as written this seems redundant with the previous cases. > > > > Yeah, it's being tested on the main table, not on a toast table. I've > removed this test-case and also restricted direct access to the toast > table using heap_force_kill/freeze functions. I think we shouldn't be > using these functions to do any changes in the toast table. We will > only use these functions with the main table and let VACUUM remove the > corresponding data chunks (pointed by the tuple that got removed from > the main table). > > Another option would be to identify all the data chunks corresponding > to the tuple (ctid) being killed from the main table and remove them > one by one. We will only do this if the tuple from the main table that > has been marked as killed has an external storage. We will have to add > a bunch of code for this otherwise we can let VACUUM do this for us. > Let me know your thoughts on this. > > > +-- test pg_surgery functions with the unsupported relations. Should fail. > > > > Please name the specific functions being tested here in case we add > > more in the future that are tested separately. > > > > Done. > > > +++ b/contrib/pg_surgery/heap_surgery_funcs.c > > > > I think we could drop _funcs from the file name. > > > > Done. > > > +#ifdef PG_MODULE_MAGIC > > +PG_MODULE_MAGIC; > > +#endif > > > > The #ifdef here is not required, and if you look at other contrib > > modules you'll see that they don't have it. > > > > Okay, done. > > > I don't like all the macros at the top of the file much. CHECKARRVALID > > is only used in one place, so it seems to me that you might as well > > just inline it and lose the macro. Likewise for SORT and ARRISEMPTY. > > > > Done. > > > Once you do that, heap_force_common() can just compute the number of > > array elements once, instead of doing it once inside ARRISEMPTY, then > > again inside SORT, and then a third time to initialize ntids. You can > > just have a local variable in that function that is set once and then > > used as needed. Then you are only doing ARRNELEMS once, so you can get > > rid of that macro too. The same technique can be used to get rid of > > ARRPTR. So then all the macros go away without introducing any code > > duplication. > > > > Done. > > > +/* Options to forcefully change the state of a heap tuple. */ > > +typedef enum HTupleForceOption > > +{ > > + FORCE_KILL, > > + FORCE_FREEZE > > +} HTupleForceOption; > > > > I suggest un-abbreviating HTuple to HeapTuple and un-abbreviating the > > enum members to HEAP_FORCE_KILL and HEAP_FORCE_FREE. > > Done. > > Also, how about > > option -> operation? > > > > I think both look okay to me. > > > + return heap_force_common(fcinfo, FORCE_KILL); > > > > I think it might be more idiomatic to use PG_RETURN_DATUM here. I > > know it's the same thing, though, and perhaps I'm even wrong about the > > prevailing style. > > > > Done. > > > + Assert(force_opt == FORCE_KILL || force_opt == FORCE_FREEZE); > > > > I think this is unnecessary. It's an enum with 2 values. > > > > Removed. > > > + if (ARRISEMPTY(ta)) > > + ereport(ERROR, > > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > > + errmsg("empty tid array"))); > > > > I don't see why this should be an error. Why can't we just continue > > normally and if it does nothing, it does nothing? You'd need to change > > the do..while loop to a while loop so that the end condition is tested > > at the top, but that seems fine. > > > > I think it's okay to have this check. So, just left it as-is. We do > have such checks in other contrib modules as well wherever the array > is being passed as an input to the function. > > > + rel = relation_open(relid, AccessShareLock); > > > > Maybe we should take RowExclusiveLock, since we are going to modify > > rows. Not sure how much it matters, though. > > > > I don't know how it would make a difference, but still as you said > replaced AccessShare with RowExclusive. > > > + if (!superuser() && GetUserId() != rel->rd_rel->relowner) > > + ereport(ERROR, > > + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), > > + errmsg("must be superuser or object owner to use %s.", > > + force_opt == FORCE_KILL ? "heap_force_kill()" : > > + "heap_force_freeze()"))); > > > > This is the wrong way to do a permissions check, and it's also the > > wrong way to write an error message about having failed one. To see > > the correct method, grep for pg_class_aclcheck. However, I think that > > we shouldn't in general trust the object owner to do this, unless the > > super-user gave permission. This is a data-corrupting operation, and > > only the boss is allowed to authorize it. So I think you should also > > add REVOKE EXECUTE FROM PUBLIC statements to the SQL file, and then > > have this check as a backup. Then, the superuser is always allowed, > > and if they choose to GRANT EXECUTE on this function to some users, > > those users can do it for their own relations, but not others. > > > > Done. > > > + if (rel->rd_rel->relam != HEAP_TABLE_AM_OID) > > + ereport(ERROR, > > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > > + errmsg("only heap AM is supported"))); > > + > > + check_relation_relkind(rel); > > > > Seems like these checks are in the wrong order. > > I don't think there is anything wrong with the order. I can see the > same order in other contrib modules as well. > > Also, maybe you could > > call the function something like check_relation_ok() and put the > > permissions test, the relkind test, and the relam test all inside of > > it, just to tighten up the code in this main function a bit. > > > > Yeah, I've added a couple of functions named sanity_check_relation and > sanity_check_tid_array and shifted all the sanity checks there. > > > + if (noffs > maxoffset) > > + ereport(ERROR, > > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > > + errmsg("number of offsets specified for block %u exceeds the max > > offset number %u", > > + blkno, maxoffset))); > > > > Hmm, this doesn't seem quite right. The actual problem is if an > > individual item pointer's offset number is greater than maxoffset, > > which can be true even if the total number of offsets is less than > > maxoffset. So I think you need to remove this check and add a check > > inside the loop which follows that offnos[i] is in range. > > > > Agreed and done. > > > The way you've structured that loop is actually problematic -- I don't > > think we want to be calling elog() or ereport() inside a critical > > section. You could fix the case that checks for an invalid force_opt > > by just doing if (op == HEAP_FORCE_KILL) { ... } else { Assert(op == > > HEAP_FORCE_FREEZE); ... }, or by using a switch with no default. The > > NOTICE case you have here is a bigger problem. > > Done. > > You really cannot > > modify the buffer like this and then decide, oops, never mind, I think > > I won't mark it dirty or write WAL for the changes. If you do that, > > the buffer is still in memory, but it's now been modified. A > > subsequent operation that modifies it will start with the altered > > state you created here, quite possibly leading to WAL that cannot be > > correctly replayed on the standby. In other words, you've got to > > decide for certain whether you want to proceed with the operation > > *before* you enter the critical section. You also need to emit any > > messages before or after the critical section. So you could: > > > > This is still not clear. I think Robert needs to respond to my earlier comment. > > > I believe this violates our guidelines on message construction. Have > > two completely separate messages -- and maybe lose the word "already": > > > > "skipping tid (%u, %u) because it is dead" > > "skipping tid (%u, %u) because it is unused" > > > > The point of this is that it makes it easier for translators. > > > > Done. > > > I see very little point in what verify_tid() is doing. Before using > > each block number, we should check that it's less than or equal to a > > cached value of RelationGetNumberOfBlocks(rel). That's necessary in > > any case to avoid funny errors; and then the check here against > > specifically InvalidBlockNumber is redundant. For the offset number, > > same thing: we need to check each offset against the page's > > PageGetMaxOffsetNumber(page); and if we do that then we don't need > > these checks. > > > > Done. > > Please check the attached patch for the changes. I also looked at this version patch and have some small comments: + Oid relid = PG_GETARG_OID(0); + ArrayType *ta = PG_GETARG_ARRAYTYPE_P_COPY(1); + ItemPointer tids; + int ntids; + Relation rel; + Buffer buf; + Page page; + ItemId itemid; + BlockNumber blkno; + OffsetNumber *offnos; + OffsetNumber offno, + noffs, + curr_start_ptr, + next_start_ptr, + maxoffset; + int i, + nskippedItems, + nblocks; You declare all variables at the top of heap_force_common() function but I think we can declare some variables such as buf, page inside of the do loop. --- + if (offnos[i] > maxoffset) + { + ereport(NOTICE, + errmsg("skipping tid (%u, %u) because it contains an invalid offset", + blkno, offnos[i])); + continue; + } If all tids on a page take the above path, we will end up logging FPI in spite of modifying nothing on the page. --- + /* XLOG stuff */ + if (RelationNeedsWAL(rel)) + log_newpage_buffer(buf, true); I think we need to set the returned LSN by log_newpage_buffer() to the page lsn. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: