Re: recovering from "found xmin ... from before relfrozenxid ..." - Mailing list pgsql-hackers

From Ashutosh Sharma
Subject Re: recovering from "found xmin ... from before relfrozenxid ..."
Date
Msg-id CAE9k0P=EBb1JANX0rK95TY2+v_-V-FdT+s_hXdCoPuQ5zPx+9g@mail.gmail.com
Whole thread Raw
In response to Re: recovering from "found xmin ... from before relfrozenxid ..."  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: recovering from "found xmin ... from before relfrozenxid ..."
Re: recovering from "found xmin ... from before relfrozenxid ..."
List pgsql-hackers
Hi Robert,

Thanks for the review. Please find my comments inline:

On Sat, Aug 1, 2020 at 12:18 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Jul 31, 2020 at 8:52 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> > Attached is the new version of patch that addresses the comments from Andrey and Beena.
>
> +PGFILEDESC = "pg_surgery - perform surgery on the damaged heap table"
>
> the -> a
>
> I also suggest: heap table -> relation, because we might want to
> extend this to other cases later.
>

Corrected.

> +-- toast table.
> +begin;
> +create table ttab(a text);
> +insert into ttab select string_agg(chr(floor(random() * 26)::int +
> 65), '') from generate_series(1,10000);
> +select * from ttab where xmin = 2;
> + a
> +---
> +(0 rows)
> +
> +select heap_force_freeze('ttab'::regclass, ARRAY['(0, 1)']::tid[]);
> + heap_force_freeze
> +-------------------
> +
> +(1 row)
> +
>
> I don't understand the point of this. You're not testing the function
> on the TOAST table; you're testing it on the main table when there
> happens to be a TOAST table that is probably getting used for
> something. But that's not really relevant to what is being tested
> here, so as written this seems redundant with the previous cases.
>

Yeah, it's being tested on the main table, not on a toast table. I've
removed this test-case and also restricted direct access to the toast
table using heap_force_kill/freeze functions. I think we shouldn't be
using these functions to do any changes in the toast table. We will
only use these functions with the main table and let VACUUM remove the
corresponding data chunks (pointed by the tuple that got removed from
the main table).

Another option would be to identify all the data chunks corresponding
to the tuple (ctid) being killed from the main table and remove them
one by one. We will only do this if the tuple from the main table that
has been marked as killed has an external storage. We will have to add
a bunch of code for this otherwise we can let VACUUM do this for us.
Let me know your thoughts on this.

> +-- test pg_surgery functions with the unsupported relations. Should fail.
>
> Please name the specific functions being tested here in case we add
> more in the future that are tested separately.
>

Done.

> +++ b/contrib/pg_surgery/heap_surgery_funcs.c
>
> I think we could drop _funcs from the file name.
>

Done.

> +#ifdef PG_MODULE_MAGIC
> +PG_MODULE_MAGIC;
> +#endif
>
> The #ifdef here is not required, and if you look at other contrib
> modules you'll see that they don't have it.
>

Okay, done.

> I don't like all the macros at the top of the file much. CHECKARRVALID
> is only used in one place, so it seems to me that you might as well
> just inline it and lose the macro. Likewise for SORT and ARRISEMPTY.
>

Done.

> Once you do that, heap_force_common() can just compute the number of
> array elements once, instead of doing it once inside ARRISEMPTY, then
> again inside SORT, and then a third time to initialize ntids. You can
> just have a local variable in that function that is set once and then
> used as needed. Then you are only doing ARRNELEMS once, so you can get
> rid of that macro too. The same technique can be used to get rid of
> ARRPTR. So then all the macros go away without introducing any code
> duplication.
>

Done.

> +/* Options to forcefully change the state of a heap tuple. */
> +typedef enum HTupleForceOption
> +{
> + FORCE_KILL,
> + FORCE_FREEZE
> +} HTupleForceOption;
>
> I suggest un-abbreviating HTuple to HeapTuple and un-abbreviating the
> enum members to HEAP_FORCE_KILL and HEAP_FORCE_FREE.

Done.

Also, how about
> option -> operation?
>

I think both look okay to me.

> + return heap_force_common(fcinfo, FORCE_KILL);
>
> I think it might be more idiomatic to use PG_RETURN_DATUM here.  I
> know it's the same thing, though, and perhaps I'm even wrong about the
> prevailing style.
>

Done.

> + Assert(force_opt == FORCE_KILL || force_opt == FORCE_FREEZE);
>
> I think this is unnecessary. It's an enum with 2 values.
>

Removed.

> + if (ARRISEMPTY(ta))
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("empty tid array")));
>
> I don't see why this should be an error. Why can't we just continue
> normally and if it does nothing, it does nothing? You'd need to change
> the do..while loop to a while loop so that the end condition is tested
> at the top, but that seems fine.
>

I think it's okay to have this check. So, just left it as-is. We do
have such checks in other contrib modules as well wherever the array
is being passed as an input to the function.

> + rel = relation_open(relid, AccessShareLock);
>
> Maybe we should take RowExclusiveLock, since we are going to modify
> rows. Not sure how much it matters, though.
>

I don't know how it would make a difference, but still as you said
replaced AccessShare with RowExclusive.

> + if (!superuser() && GetUserId() != rel->rd_rel->relowner)
> + ereport(ERROR,
> + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> + errmsg("must be superuser or object owner to use %s.",
> + force_opt == FORCE_KILL ? "heap_force_kill()" :
> + "heap_force_freeze()")));
>
> This is the wrong way to do a permissions check, and it's also the
> wrong way to write an error message about having failed one. To see
> the correct method, grep for pg_class_aclcheck. However, I think that
> we shouldn't in general trust the object owner to do this, unless the
> super-user gave permission. This is a data-corrupting operation, and
> only the boss is allowed to authorize it. So I think you should also
> add REVOKE EXECUTE FROM PUBLIC statements to the SQL file, and then
> have this check as a backup. Then, the superuser is always allowed,
> and if they choose to GRANT EXECUTE on this function to some users,
> those users can do it for their own relations, but not others.
>

Done.

> + if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("only heap AM is supported")));
> +
> + check_relation_relkind(rel);
>
> Seems like these checks are in the wrong order.

I don't think there is anything wrong with the order. I can see the
same order in other contrib modules as well.

Also, maybe you could
> call the function something like check_relation_ok() and put the
> permissions test, the relkind test, and the relam test all inside of
> it, just to tighten up the code in this main function a bit.
>

Yeah, I've added a couple of functions named sanity_check_relation and
sanity_check_tid_array and shifted all the sanity checks there.

> + if (noffs > maxoffset)
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("number of offsets specified for block %u exceeds the max
> offset number %u",
> + blkno, maxoffset)));
>
> Hmm, this doesn't seem quite right. The actual problem is if an
> individual item pointer's offset number is greater than maxoffset,
> which can be true even if the total number of offsets is less than
> maxoffset. So I think you need to remove this check and add a check
> inside the loop which follows that offnos[i] is in range.
>

Agreed and done.

> The way you've structured that loop is actually problematic -- I don't
> think we want to be calling elog() or ereport() inside a critical
> section. You could fix the case that checks for an invalid force_opt
> by just doing if (op == HEAP_FORCE_KILL) { ... } else { Assert(op ==
> HEAP_FORCE_FREEZE); ... }, or by using a switch with no default. The
> NOTICE case you have here is a bigger problem.

Done.

You really cannot
> modify the buffer like this and then decide, oops, never mind, I think
> I won't mark it dirty or write WAL for the changes. If you do that,
> the buffer is still in memory, but it's now been modified. A
> subsequent operation that modifies it will start with the altered
> state you created here, quite possibly leading to WAL that cannot be
> correctly replayed on the standby. In other words, you've got to
> decide for certain whether you want to proceed with the operation
> *before* you enter the critical section. You also need to emit any
> messages before or after the critical section.  So you could:
>

This is still not clear. I think Robert needs to respond to my earlier comment.

> I believe this violates our guidelines on message construction. Have
> two completely separate messages -- and maybe lose the word "already":
>
> "skipping tid (%u, %u) because it is dead"
> "skipping tid (%u, %u) because it is unused"
>
> The point of this is that it makes it easier for translators.
>

Done.

> I see very little point in what verify_tid() is doing. Before using
> each block number, we should check that it's less than or equal to a
> cached value of RelationGetNumberOfBlocks(rel). That's necessary in
> any case to avoid funny errors; and then the check here against
> specifically InvalidBlockNumber is redundant. For the offset number,
> same thing: we need to check each offset against the page's
> PageGetMaxOffsetNumber(page); and if we do that then we don't need
> these checks.
>

Done.

Please check the attached patch for the changes.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: [PATCH] - Provide robust alternatives for replace_string
Next
From: Dmitry Dolgov
Date:
Subject: Re: [HACKERS] [PATCH] Generic type subscripting