Re: recovering from "found xmin ... from before relfrozenxid ..." - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: recovering from "found xmin ... from before relfrozenxid ..." |
Date | |
Msg-id | CA+TgmoZLpeFNg5ypkXfKdvKktZKSwxprnB3TinDUDz12fdKScw@mail.gmail.com Whole thread Raw |
In response to | Re: recovering from "found xmin ... from before relfrozenxid ..." (Ashutosh Sharma <ashu.coek88@gmail.com>) |
Responses |
Re: recovering from "found xmin ... from before relfrozenxid ..."
Re: recovering from "found xmin ... from before relfrozenxid ..." |
List | pgsql-hackers |
On Fri, Jul 31, 2020 at 8:52 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > Attached is the new version of patch that addresses the comments from Andrey and Beena. +PGFILEDESC = "pg_surgery - perform surgery on the damaged heap table" the -> a I also suggest: heap table -> relation, because we might want to extend this to other cases later. +-- toast table. +begin; +create table ttab(a text); +insert into ttab select string_agg(chr(floor(random() * 26)::int + 65), '') from generate_series(1,10000); +select * from ttab where xmin = 2; + a +--- +(0 rows) + +select heap_force_freeze('ttab'::regclass, ARRAY['(0, 1)']::tid[]); + heap_force_freeze +------------------- + +(1 row) + I don't understand the point of this. You're not testing the function on the TOAST table; you're testing it on the main table when there happens to be a TOAST table that is probably getting used for something. But that's not really relevant to what is being tested here, so as written this seems redundant with the previous cases. +-- test pg_surgery functions with the unsupported relations. Should fail. Please name the specific functions being tested here in case we add more in the future that are tested separately. +++ b/contrib/pg_surgery/heap_surgery_funcs.c I think we could drop _funcs from the file name. +#ifdef PG_MODULE_MAGIC +PG_MODULE_MAGIC; +#endif The #ifdef here is not required, and if you look at other contrib modules you'll see that they don't have it. I don't like all the macros at the top of the file much. CHECKARRVALID is only used in one place, so it seems to me that you might as well just inline it and lose the macro. Likewise for SORT and ARRISEMPTY. Once you do that, heap_force_common() can just compute the number of array elements once, instead of doing it once inside ARRISEMPTY, then again inside SORT, and then a third time to initialize ntids. You can just have a local variable in that function that is set once and then used as needed. Then you are only doing ARRNELEMS once, so you can get rid of that macro too. The same technique can be used to get rid of ARRPTR. So then all the macros go away without introducing any code duplication. +/* Options to forcefully change the state of a heap tuple. */ +typedef enum HTupleForceOption +{ + FORCE_KILL, + FORCE_FREEZE +} HTupleForceOption; I suggest un-abbreviating HTuple to HeapTuple and un-abbreviating the enum members to HEAP_FORCE_KILL and HEAP_FORCE_FREE. Also, how about option -> operation? + return heap_force_common(fcinfo, FORCE_KILL); I think it might be more idiomatic to use PG_RETURN_DATUM here. I know it's the same thing, though, and perhaps I'm even wrong about the prevailing style. + Assert(force_opt == FORCE_KILL || force_opt == FORCE_FREEZE); I think this is unnecessary. It's an enum with 2 values. + if (ARRISEMPTY(ta)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("empty tid array"))); I don't see why this should be an error. Why can't we just continue normally and if it does nothing, it does nothing? You'd need to change the do..while loop to a while loop so that the end condition is tested at the top, but that seems fine. + rel = relation_open(relid, AccessShareLock); Maybe we should take RowExclusiveLock, since we are going to modify rows. Not sure how much it matters, though. + if (!superuser() && GetUserId() != rel->rd_rel->relowner) + ereport(ERROR, + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), + errmsg("must be superuser or object owner to use %s.", + force_opt == FORCE_KILL ? "heap_force_kill()" : + "heap_force_freeze()"))); This is the wrong way to do a permissions check, and it's also the wrong way to write an error message about having failed one. To see the correct method, grep for pg_class_aclcheck. However, I think that we shouldn't in general trust the object owner to do this, unless the super-user gave permission. This is a data-corrupting operation, and only the boss is allowed to authorize it. So I think you should also add REVOKE EXECUTE FROM PUBLIC statements to the SQL file, and then have this check as a backup. Then, the superuser is always allowed, and if they choose to GRANT EXECUTE on this function to some users, those users can do it for their own relations, but not others. + if (rel->rd_rel->relam != HEAP_TABLE_AM_OID) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("only heap AM is supported"))); + + check_relation_relkind(rel); Seems like these checks are in the wrong order. Also, maybe you could call the function something like check_relation_ok() and put the permissions test, the relkind test, and the relam test all inside of it, just to tighten up the code in this main function a bit. + if (noffs > maxoffset) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("number of offsets specified for block %u exceeds the max offset number %u", + blkno, maxoffset))); Hmm, this doesn't seem quite right. The actual problem is if an individual item pointer's offset number is greater than maxoffset, which can be true even if the total number of offsets is less than maxoffset. So I think you need to remove this check and add a check inside the loop which follows that offnos[i] is in range. The way you've structured that loop is actually problematic -- I don't think we want to be calling elog() or ereport() inside a critical section. You could fix the case that checks for an invalid force_opt by just doing if (op == HEAP_FORCE_KILL) { ... } else { Assert(op == HEAP_FORCE_FREEZE); ... }, or by using a switch with no default. The NOTICE case you have here is a bigger problem. You really cannot modify the buffer like this and then decide, oops, never mind, I think I won't mark it dirty or write WAL for the changes. If you do that, the buffer is still in memory, but it's now been modified. A subsequent operation that modifies it will start with the altered state you created here, quite possibly leading to WAL that cannot be correctly replayed on the standby. In other words, you've got to decide for certain whether you want to proceed with the operation *before* you enter the critical section. You also need to emit any messages before or after the critical section. So you could: 1. If you encounter a TID that's unused or dead, skip it silently. -or- 2. Loop over offsets twice. The first time, ERROR if you find any one that is unused or dead. Then start a critical section. Loop again and do the real work. -or- 3. Like #2, but emit a NOTICE about a unused or dead item rather than an ERROR, and skip the critical section and the second loop if you did that >0 times. -or- 4. Like #3, but don't skip anything just because you emitted a NOTICE about the page. #3 is closest to the behavior you have now, but I'm not sure what else it has going for it. It doesn't seem like particularly intuitive behavior that finding a dead or unused TID should cause other item TIDs on the same page not to get processed while still permitting TIDs on other pages to get processed. I don't think that's the behavior users will be expecting. I think my vote is for #4, which will emit a NOTICE about any TID that is dead or unused -- and I guess also about any TID whose offset number is out of range -- but won't actually skip any operations that can be performed. But there are decent arguments for #1 or #2 too. + (errmsg("skipping tid (%u, %u) because it is already marked %s", + blkno, offnos[i], + ItemIdIsDead(itemid) ? "dead" : "unused"))); I believe this violates our guidelines on message construction. Have two completely separate messages -- and maybe lose the word "already": "skipping tid (%u, %u) because it is dead" "skipping tid (%u, %u) because it is unused" The point of this is that it makes it easier for translators. I see very little point in what verify_tid() is doing. Before using each block number, we should check that it's less than or equal to a cached value of RelationGetNumberOfBlocks(rel). That's necessary in any case to avoid funny errors; and then the check here against specifically InvalidBlockNumber is redundant. For the offset number, same thing: we need to check each offset against the page's PageGetMaxOffsetNumber(page); and if we do that then we don't need these checks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: