Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE - Mailing list pgsql-hackers

From Andres Freund
Subject Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE
Date
Msg-id 20230131223804.o7wlonfqlif2rcdo@alap3.anarazel.de
Whole thread Raw
In response to Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Responses Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE
List pgsql-hackers
Hi,

On 2023-01-31 15:05:17 +0100, Matthias van de Meent wrote:
> On Mon, 30 Jan 2023 at 21:19, Andres Freund <andres@anarazel.de> wrote:
> > In an earlier, not posted, version I had an vacuum_defer_cleanup_age specific
> > helper function for this, but it seems likely we'll need it in other places
> > too.  So I named it TransactionIdRetreatSafely(). I made it accept the xid by
> > pointer, as the line lengths / repetition otherwise end up making it hard to
> > read the code.  For now I have TransactionIdRetreatSafely() be private to
> > procarray.c, but I expect we'll have to change that eventually.
> 
> If TransactionIdRetreatSafely will be exposed outside procarray.c,
> then I think the xid pointer should be replaced with normal
> arguments/returns; both for parity with TransactionIdRetreatedBy

That's why I named one version *Retreat the other Retreated :)

I think it'll make the code easier to read in the other places too, the
variable names / function names in this space are uncomfortably long to
fit into 78chars..., particularly when there's two references to the
same variable in the same line.


> and to remove this memory store dependency in this hot code path.

I doubt that matters much here and the places it's going to be used
in. And presumably the compiler will inline it anyway. I'd probably make
it a static inline in the header too.

What's making me hesitate about exposing it is that it's quite easy to
get things wrong by using a wrong fxid or such.


> > +        /*
> > +         * FIXME, doubtful this is the best fix.
> > +         *
> > +         * Can't represent the 32bit xid as a 64bit xid, as it's before fxid
> > +         * 0. Represent it as an xid from the future instead.
> > +         */
> > +        if (epoch == 0)
> > +            return FullTransactionIdFromEpochAndXid(0, xid);
> 
> Shouldn't this be an error condition instead, as this XID should not
> be able to appear?

If you mean error in the sense of ERROR, no, I don't think so. That code
tries hard to be able to check many tuples in a row. And if we were to
error out here, we'd not able to do that. We should still report those
tuples as corrupt, fwiw.

The reason this path is hit is that a test intentionally corrupts some
xids. So the path is reachable and we need to cope somehow.

I'm not really satisfied with this fix either - I mostly wanted to
include something sufficient to prevent assertion failures.

I had hoped that Mark would look at the amcheck bits and come up with
more complete fixes.


> on 0004:
> 
> > -       '0xffffffffffffffff'::xid8,
> > -       '-1'::xid8;
> > +       '0xefffffffffffffff'::xid8,
> > +       '0'::xid8;
> 
> The 0xFF... usages were replaced with "0xEFFF...". Shouldn't we also
> test on 0xffff_fffE_ffff_ffff to test for input of our actual max
> value?

Probably a good idea.


> > @@ -326,7 +329,11 @@ parse_snapshot(const char *str, Node *escontext)
> >     while (*str != '\0')
> >     {
> >         /* read next value */
> > -        val = FullTransactionIdFromU64(strtou64(str, &endp, 10));
> > +        raw_fxid = strtou64(str, &endp, 10);
> > +
> > +        val = FullTransactionIdFromU64(raw_fxid);
> > +        if (!InFullTransactionIdRange(raw_fxid))
> > +            goto bad_format;
> 
> With assertions enabled FullTransactionIdFromU64 will assert the
> InFullTransactionIdRange condition, meaning we wouldn't hit the branch
> into bad_format.
> I think these operations should be swapped, as parsing a snapshot
> shouldn't run into assertion failures like this if it can error
> normally.

Yep.


> Maybe this can be added to tests as well?

I'll check. I thought for a bit it'd not work because we'd perform range
checks on the xids, but we don't...

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: "Joel Jacobson"
Date:
Subject: Re: [PATCH] Fix old thinko in formula to compute sweight in numeric_sqrt().
Next
From: Peter Geoghegan
Date:
Subject: Re: Show various offset arrays for heap WAL records