Re: recovering from "found xmin ... from before relfrozenxid ..." - Mailing list pgsql-hackers

From Ashutosh Sharma
Subject Re: recovering from "found xmin ... from before relfrozenxid ..."
Date
Msg-id CAE9k0P=QAmVBs=D+SdtNn=G_1_-f_MC9sAXK=SYsSy2BPedhcw@mail.gmail.com
Whole thread Raw
In response to Re: recovering from "found xmin ... from before relfrozenxid ..."  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
List pgsql-hackers
Hi Masahiko-san,

Thank you for the review. Please check my comments inline below:

On Tue, Aug 25, 2020 at 1:39 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Fri, 21 Aug 2020 at 22:25, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> >
> > Hi Masahiko-san,
> >
> > Please find the updated patch with the following new changes:
> >
>
> Thank you for updating the patch!
>
> > 1) It adds the code changes in heap_force_kill function to clear an
> > all-visible bit on the visibility map corresponding to the page that
> > is marked all-visible. Along the way it also clears PD_ALL_VISIBLE
> > flag on the page header.
>
> I think we need to clear all visibility map bits by using
> VISIBILITYMAP_VALID_BITS. Otherwise, the page has all-frozen bit but
> not all-visible bit, which is not a valid state.
>

Yeah, makes sense, I will do that change in the next version of patch.

> >
> > 2) It adds the code changes in heap_force_freeze function to reset the
> > ctid value in a tuple header if it is corrupted.
> >
> > 3) It adds several notes and examples in the documentation stating
> > when and how we need to use the functions provided by this module.
> >
> > Please have a look and let me know for any other concern.
> >
>
> And here are small comments on the heap_surgery.c:
>
> +       /*
> +        * Get the offset numbers from the tids belonging to one particular page
> +        * and process them one by one.
> +        */
> +       blkno = tids_same_page_fetch_offnums(tids, ntids, &next_start_ptr,
> +                                            offnos);
> +
> +       /* Calculate the number of offsets stored in offnos array. */
> +       noffs = next_start_ptr - curr_start_ptr;
> +
> +       /*
> +        * Update the current start pointer so that next time when
> +        * tids_same_page_fetch_offnums() is called, we can calculate the number
> +        * of offsets present in the offnos array.
> +        */
> +       curr_start_ptr = next_start_ptr;
> +
> +       /* Check whether the block number is valid. */
> +       if (blkno >= nblocks)
> +       {
> +           ereport(NOTICE,
> +                   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                    errmsg("skipping block %u for relation \"%s\"
> because the block number is out of range",
> +                           blkno, RelationGetRelationName(rel))));
> +           continue;
> +       }
> +
> +       CHECK_FOR_INTERRUPTS();
>
> I guess it would be better to call CHECK_FOR_INTERRUPTS() at the top
> of the do loop for safety. I think it's unlikely to happen but the
> user might mistakenly specify a lot of wrong block numbers.
>

Okay, np, will shift it to top of the do loop.

> ----
> +   offnos = (OffsetNumber *) palloc(ntids * sizeof(OffsetNumber));
> +   noffs = curr_start_ptr = next_start_ptr = 0;
> +   nblocks = RelationGetNumberOfBlocks(rel);
> +
> +   do
> +   {
>
> (snip)
>
> +
> +       /*
> +        * Get the offset numbers from the tids belonging to one particular page
> +        * and process them one by one.
> +        */
> +       blkno = tids_same_page_fetch_offnums(tids, ntids, &next_start_ptr,
> +                                            offnos);
> +
> +       /* Calculate the number of offsets stored in offnos array. */
> +       noffs = next_start_ptr - curr_start_ptr;
> +
>
> (snip)
>
> +       /* No ereport(ERROR) from here until all the changes are logged. */
> +       START_CRIT_SECTION();
> +
> +       for (i = 0; i < noffs; i++)
>
> You copy all offset numbers belonging to the same page to palloc'd
> array, offnos, and iterate it while processing the tuples. I might be
> missing something but I think we can do that without allocating the
> space for offset numbers. Is there any reason for this? I guess we can
> do that by just iterating the sorted tids array.
>

Hmmm.. okay, I see your point. I think probably what you are trying to
suggest here is to make use of the current and next start pointers to
get the tids belonging to the same page and process them one by one
instead of fetching the offset numbers of all tids belonging to one
page into the offnos array and then iterate through the offnos array.
I think that is probably possible and I will try to do that in the
next version of patch. If there is something else that you have in
your mind, please let me know.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Autovacuum on partitioned table (autoanalyze)
Next
From: David Rowley
Date:
Subject: Fix a couple of misuages of bms_num_members()