Re: recovering from "found xmin ... from before relfrozenxid ..." - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: recovering from "found xmin ... from before relfrozenxid ..."
Date
Msg-id CA+fd4k6yQbjF=wxLShpBNF7pstsOdqCUXhSNv-v698LLYaYrEA@mail.gmail.com
Whole thread Raw
In response to Re: recovering from "found xmin ... from before relfrozenxid ..."  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
Responses Re: recovering from "found xmin ... from before relfrozenxid ..."
Re: recovering from "found xmin ... from before relfrozenxid ..."
List pgsql-hackers
On Tue, 25 Aug 2020 at 17:08, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Fri, 21 Aug 2020 at 22:25, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> >
> > Hi Masahiko-san,
> >
> > Please find the updated patch with the following new changes:
> >
>
> Thank you for updating the patch!
>
> > 1) It adds the code changes in heap_force_kill function to clear an
> > all-visible bit on the visibility map corresponding to the page that
> > is marked all-visible. Along the way it also clears PD_ALL_VISIBLE
> > flag on the page header.
>
> I think we need to clear all visibility map bits by using
> VISIBILITYMAP_VALID_BITS. Otherwise, the page has all-frozen bit but
> not all-visible bit, which is not a valid state.
>
> >
> > 2) It adds the code changes in heap_force_freeze function to reset the
> > ctid value in a tuple header if it is corrupted.
> >
> > 3) It adds several notes and examples in the documentation stating
> > when and how we need to use the functions provided by this module.
> >
> > Please have a look and let me know for any other concern.
> >
>
> And here are small comments on the heap_surgery.c:
>
> +       /*
> +        * Get the offset numbers from the tids belonging to one particular page
> +        * and process them one by one.
> +        */
> +       blkno = tids_same_page_fetch_offnums(tids, ntids, &next_start_ptr,
> +                                            offnos);
> +
> +       /* Calculate the number of offsets stored in offnos array. */
> +       noffs = next_start_ptr - curr_start_ptr;
> +
> +       /*
> +        * Update the current start pointer so that next time when
> +        * tids_same_page_fetch_offnums() is called, we can calculate the number
> +        * of offsets present in the offnos array.
> +        */
> +       curr_start_ptr = next_start_ptr;
> +
> +       /* Check whether the block number is valid. */
> +       if (blkno >= nblocks)
> +       {
> +           ereport(NOTICE,
> +                   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                    errmsg("skipping block %u for relation \"%s\"
> because the block number is out of range",
> +                           blkno, RelationGetRelationName(rel))));
> +           continue;
> +       }
> +
> +       CHECK_FOR_INTERRUPTS();
>
> I guess it would be better to call CHECK_FOR_INTERRUPTS() at the top
> of the do loop for safety. I think it's unlikely to happen but the
> user might mistakenly specify a lot of wrong block numbers.
>
> ----
> +   offnos = (OffsetNumber *) palloc(ntids * sizeof(OffsetNumber));
> +   noffs = curr_start_ptr = next_start_ptr = 0;
> +   nblocks = RelationGetNumberOfBlocks(rel);
> +
> +   do
> +   {
>
> (snip)
>
> +
> +       /*
> +        * Get the offset numbers from the tids belonging to one particular page
> +        * and process them one by one.
> +        */
> +       blkno = tids_same_page_fetch_offnums(tids, ntids, &next_start_ptr,
> +                                            offnos);
> +
> +       /* Calculate the number of offsets stored in offnos array. */
> +       noffs = next_start_ptr - curr_start_ptr;
> +
>
> (snip)
>
> +       /* No ereport(ERROR) from here until all the changes are logged. */
> +       START_CRIT_SECTION();
> +
> +       for (i = 0; i < noffs; i++)
>
> You copy all offset numbers belonging to the same page to palloc'd
> array, offnos, and iterate it while processing the tuples. I might be
> missing something but I think we can do that without allocating the
> space for offset numbers. Is there any reason for this? I guess we can
> do that by just iterating the sorted tids array.
>

Let me share other comments on the latest version patch:

Some words need to be tagged. For instance, I found the following words:

VACUUM
DISABLE_PAGE_SKIPPING
HEAP_XMIN_FROZEN
HEAP_XMAX_INVALID

---
+test=# select ctid from t1 where xmin = 507;
+ ctid
+-------
+ (0,3)
+(1 row)
+
+test=# select heap_force_freeze('t1'::regclass, ARRAY['(0, 3)']::tid[]);
+-[ RECORD 1 ]-----+-
+heap_force_freeze |

I think it's better to use a consistent output format. The former uses
the normal format whereas the latter uses the expanded format.

---
+ <note>
+ <para>
+  While performing surgery on a damaged relation, we must not be doing anything
+  else on that relation in parallel. This is to ensure that when we are
+  operating on a damaged tuple there is no other transaction trying to modify
+  that tuple.
+ </para>
+ </note>

If we prefer to avoid concurrent operations on the target relation why
don't we use AccessExclusiveLock?

---
+CREATE FUNCTION heap_force_kill(reloid regclass, tids tid[])
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'heap_force_kill'
+LANGUAGE C STRICT;

+CREATE FUNCTION heap_force_freeze(reloid regclass, tids tid[])
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'heap_force_freeze'
+LANGUAGE C STRICT;

I think these functions should be PARALLEL UNSAFE.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: passwordcheck: Log cracklib diagnostics
Next
From: Daniel Gustafsson
Date:
Subject: Re: Autovacuum on partitioned table (autoanalyze)