Re: Online enabling of checksums - Mailing list pgsql-hackers

From Daniel Gustafsson
Subject Re: Online enabling of checksums
Date
Msg-id FB948276-7B32-4B77-83E6-D00167F8EEB4@yesql.se
Whole thread Raw
In response to Re: Online enabling of checksums  (Magnus Hagander <magnus@hagander.net>)
Responses Re: Online enabling of checksums
Re: Online enabling of checksums
List pgsql-hackers
> On 05 Apr 2018, at 23:13, Magnus Hagander <magnus@hagander.net> wrote:

> (And yes, we've noticed it's failing in isolationtester on a number of boxes -- Daniel is currently investigating)

Looking into the isolationtester failure on piculet, which builds using
--disable-atomics, and locust which doesn’t have atomics, the code for
pg_atomic_test_set_flag seems a bit odd.

TAS() is defined to return zero if successful, and pg_atomic_test_set_flag()
defined to return True if it could set.  When running without atomics, don’t we
need to do something like the below diff to make these APIs match?  :

--- a/src/backend/port/atomics.c
+++ b/src/backend/port/atomics.c
@@ -73,7 +73,7 @@ pg_atomic_init_flag_impl(volatile pg_atomic_flag *ptr)
 bool
 pg_atomic_test_set_flag_impl(volatile pg_atomic_flag *ptr)
 {
-       return TAS((slock_t *) &ptr->sema);
+       return TAS((slock_t *) &ptr->sema) == 0;
 }

Applying this makes the _cancel test pass, moving the failure instead to the
following _enable test (which matches what coypu and mylodon are seeing).

AFAICT there are no other callers of this than the online checksum code, and
this is not executed by the tests when running without atomics, which could
explain why nothing else is broken.

Before continuing the debugging, does this theory hold any water?  This isn’t
code I’m deeply familiar with so would appreciate any pointers.

cheers ./daniel

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: [HACKERS] path toward faster partition pruning
Next
From: Thomas Munro
Date:
Subject: Re: Checkpoint not retrying failed fsync?