Re: Injection point locking - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Injection point locking
Date
Msg-id 87c748b3-0786-490f-a17a-51bef63e6c7f@iki.fi
Whole thread Raw
In response to Re: Injection point locking  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Injection point locking
List pgsql-hackers
On 09/07/2024 08:16, Michael Paquier wrote:
>   typedef struct InjectionPointCacheEntry
>   {
>       char        name[INJ_NAME_MAXLEN];
> +    int            slot_idx;
> +    uint64        generation;
>       char        private_data[INJ_PRIVATE_MAXLEN];
>       InjectionPointCallback callback;
>   } InjectionPointCacheEntry;
> 
> May be worth mentioning that generation is a copy of
> InjectionPointEntry's generation cross-checked at runtime with the
> shmem entry to see if we have a cache consistent with shmem under the
> same point name.

Added a comment.

> +        generation = pg_atomic_read_u64(&entry->generation);
> +        if (generation % 2 == 0)
> +            continue;
> In the loops of InjectionPointCacheRefresh() and
> InjectionPointDetach(), perhaps this should say that the slot is not
> used hence skipped when generation is even.

Added a brief "/* empty slot */" comment

> InjectionPointDetach() has this code block at its end:
>      if (!found)
>          return false;
>      return true;
> 
> Not the fault of this patch, but this can just return "found".

Done.

> The tricks with max_inuse to make the shmem lookups cheaper are
> interesting.
> 
> +        pg_read_barrier();
> +        if (memcmp(entry->name, name, namelen + 1) != 0)
> +            continue;
> Why this barrier when checking the name of a shmem entry before
> reloading it in the local cache?  Perhaps the reason should be
> commented?

Added a comment.

> +        pg_read_barrier();
> +        if (pg_atomic_read_u64(&entry->generation) != generation)
> +            continue;            /* was detached concurrently */
> +
> +        return injection_point_cache_load(&local_copy, idx, generation);
> 
> So, in InjectionPointCacheRefresh(), when a point is loaded into the
> local cache for the first time, the read of "generation" is the
> tipping point: it is possible to take a breakpoint at the beginning of
> injection_point_cache_load(), detach then attach the point.  What
> matters is that we are going to use the data in local_copy, even if
> shmem may have something entirely different.  Hmm.  Okay.  It is a bit
> annoying that the entry is just discarded and ignored if the local
> copy and shmem generations don't match?  Could it be more
> user-friendly to go back to the beginning of ActiveInjectionPoints and
> re-check the whole rather than return a NULL callback?

I thought about it, but no. If the generation number doesn't match, 
there are a few possibilities:

1. The entry was what we were looking for, but it was concurrently 
detached. Return NULL is correct in that case.

2. The entry was what we were looking for, but it was concurrently 
detached, and was then immediately reattached. NULL is a fine return 
value in that case too. When Run runs concurrently with Detach+Attach, 
you don't get any guarantee whether the actual apparent order is 
"Detach, Attach, Run", "Detach, Run, Attach", or "Run, Detach, Attach". 
NULL result corresponds to the "Detach, Run, Attach" ordering.

3. The entry was not actually what we were looking for. The name 
comparison falsely matched just because the slot was concurrently 
detached and recycled for a different injection point. We must continue 
the search in that case.

I added a comment to the top of the loop to explain scenario 2. And a 
comment to the "continue" to explain scnario 3, because that's a bit subtle.

> -   if (private_data != NULL)
> -       memcpy(entry->private_data, private_data, INJ_PRIVATE_MAXLEN);
> +   memcpy(entry->private_data, private_data, INJ_PRIVATE_MAXLEN);
> 
> private_data could be NULL, hence why the memcpy()?

It can not be NULL. You can pass NULL or a shorter length, to 
InjectionPointAttach(), but we don't store the length in shared memory.

Attached is a new version. No other changes except for fixes for the 
things you pointed out and comments.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Attachment

pgsql-hackers by date:

Previous
From: Dean Rasheed
Date:
Subject: Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.
Next
From: shveta malik
Date:
Subject: Re: Conflict Detection and Resolution