Re: HOT chain validation in verify_heapam() - Mailing list pgsql-hackers

From Andres Freund
Subject Re: HOT chain validation in verify_heapam()
Date
Msg-id 20230323203656.le7thulot4zrzi6v@awork3.anarazel.de
Whole thread Raw
In response to Re: HOT chain validation in verify_heapam()  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: HOT chain validation in verify_heapam()  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

On 2023-03-23 15:37:15 -0400, Robert Haas wrote:
> On Wed, Mar 22, 2023 at 8:38 PM Andres Freund <andres@anarazel.de> wrote:
> > skink / valgrind reported in a while back and found another issue:
> >
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2023-03-22%2021%3A53%3A41
> >
> > ==2490364== VALGRINDERROR-BEGIN
> > ==2490364== Conditional jump or move depends on uninitialised value(s)
> > ==2490364==    at 0x11D459F2: check_tuple_visibility (verify_heapam.c:1379)
> ...
> > ==2490364==  Uninitialised value was created by a stack allocation
> > ==2490364==    at 0x11D45325: check_tuple_visibility (verify_heapam.c:994)
> 
> OK, so this is an interesting one. It's complaining about switch
> (xmax_status), because the get_xid_status(xmax, ctx, &xmax_status)
> used in the previous switch might not actually initialize xmax_status,
> and apparently didn't in this case. get_xid_status() does not set
> xmax_status except when it returns XID_BOUNDS_OK, and the previous
> switch falls through both in that case and also when get_xid_status()
> returns XID_INVALID. That seems like it must be the issue here. As far
> as I can see, this isn't related to any of the recent changes but has
> been like this since this code was introduced, so I'm a little
> confused about why it's only causing a problem now.

Could it be that the tests didn't exercise the path before?


> Nonetheless, here's a patch. I notice that there's a similar problem
> in another place, too. get_xid_status() is called a total of five
> times and it looks like only three of them got it right. I suppose
> that if this is correct we should back-patch it.

Yea, I think you're right.


> +            report_corruption(ctx,
> +                              pstrdup("xmin is invalid"));

Not a correctnes issue: Nearly all callers to report_corruption() do a
psprintf(), the remaining a pstrdup(), as here. Seems like it'd be cleaner to
just make report_corruption() accept a format string?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Progress report of CREATE INDEX for nested partitioned tables
Next
From: Greg Stark
Date:
Subject: Re: Commitfest 2023-03 starting tomorrow!