Re: Cleaning up threading code - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Cleaning up threading code
Date
Msg-id 1ca9b09d-5a01-4025-951d-b687d7865666@iki.fi
Whole thread Raw
In response to Cleaning up threading code  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
On 21/08/2024 15:51, Thomas Munro wrote:
> On Sun, Apr 14, 2024 at 3:16 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>> So I'll go ahead and add the storage class to the next version, and
>> contemplate a couple of different options for the tss stuff, including
>> perhaps leaving it out if that seems doable.
> 
> Here is a new attempt at pg_threads.h.  Still work in progress, but
> passing tests, with storage class and working TSS, showing various
> users.

Looks good, thanks!

> I eventually understood first that my TSS destructor problems on
> Windows came from mismatched calling conventions, and that you can't
> really trampoline your way around that, at least not without doing
> some pretty unreasonable things, and that is why nobody can emulate
> either tss_create() or pthread_key_create() directly with Windows'
> FlsAlloc(), so everybody who tries finishes up building their own
> infrastructure to track destructors, or in ECPG's case just leaks all
> the memory instead.
> 
> Here's the simplest implementation I could come up with so far.  I
> don't have Windows so I made it possible to use emulated TSS
> destructors on my local machine with a special macro for testing, and
> then argued with CI for a while until the other machines agreed.
> Otherwise, it's all a fairly thin wrapper and hopefully not suprising.

In the Windows implementation of pg_tss_create(), how do threads and 
FlsAlloc() interact? If I understand correctly, one thread can run 
multiple "fibers" in a co-operative fashion. It's a legacy feature, not 
widely used nowadays, but what will happen if someone does try to use 
fibers? I think that will result in chaos. You're using FlsAlloc() to 
install the destructor callback but TlsAlloc() for allocating the actual 
thread-local values. So all fibers running on the same thread will share 
the storage, but the destructor will be called whenever any of the 
fibers exit, clearing the TSS storage for all fibers.

I don't know much about Windows or fibers but mixing the Tls* and Fls* 
functions seems unwise.

To be honest, I think we should just accept the limitation that TSS 
destructors don't run on Windows. Yes, it means we'll continue to leak 
memory on ecpg, but that's not a new issue. We can address that as a 
separate patch later, if desired. Or more likely, do nothing until C11 
threads with proper destructors become widely available on Windows.

> One thing this would need to be complete, at least the way I've
> implemented it, is memory barriers, for non-TSO hardware, which would
> require lifting the ban on atomics.h in frontend code, or at least
> parts of it.

+1 on doing that. Although it becomes moot if you just give up on the 
destructors on Windows.

> Only 64 bit emulation is actually tied to the backend
> now (because it calls spinlock stuff, that itself is backend-only, but
> also it doesn't actually need to be either).  Or maybe I can figure
> out a different scheme that doesn't need that.  Or something...

You could use a pg_mtx now instead of a spinlock. I wonder if there are 
any supported platforms left that don't have 64-bit atomics though.

> + * We have some extensions of our own, not present in C11:
> + *
> + * - pg_rwlock_t for read/write locks
> + * - pg_mtx_t static initializer PG_MTX_STATIC_INIT
> + * - pg_barrier_t

Hmm, I don't see anything wrong with those extensions as such, but I 
wonder how badly we really need them?

pg_rwlock is used by:
- Windows implementation of pg_mtx_t, to provide PG_MTX_STATIC_INIT
- the custom Thread-Specific Storage implementation (i.e. Windows)

PG_MTX_STATIC_INIT is used by:
- ecpg
- libpq

pg_barrier_t is used by:
- pgbench

pg_rwlock goes away if you give up on the TSS destructors on Windows, 
and implement pg_mtx directly with SRWLOCK on Windows.

The barrier implementation could easily be moved into pgbench, if we 
don't expect to need it in other places soon. Having a generic 
implementation seems fine too though, it's pretty straightforward.

PG_MTX_STATIC_INIT seems hard to get rid of. I suppose you could use 
call_once() to ensure it's initialized only once, but a static 
initializer is a lot nicer to use.

> diff --git a/src/interfaces/ecpg/ecpglib/ecpglib_extern.h b/src/interfaces/ecpg/ecpglib/ecpglib_extern.h
> index 01b4309a71..d8416b19e3 100644
> --- a/src/interfaces/ecpg/ecpglib/ecpglib_extern.h
> +++ b/src/interfaces/ecpg/ecpglib/ecpglib_extern.h
> @@ -169,7 +169,7 @@ bool        ecpg_get_data(const PGresult *, int, int, int, enum ECPGttype type,
>                            enum ECPGttype, char *, char *, long, long, long,
>                            enum ARRAY_TYPE, enum COMPAT_MODE, bool);
>  
> -void        ecpg_pthreads_init(void);
> +#define ecpg_pthreads_init()
>  struct connection *ecpg_get_connection(const char *connection_name);
>  char       *ecpg_alloc(long size, int lineno);
>  char       *ecpg_auto_alloc(long size, int lineno);

Is it safe to remove the function, or might it be referenced by existing 
binaries that were built against an older ecpglib version? I believe 
it's OK to remove, it's not part of the public API and should not be 
called directly from a binary. But in that case, we don't need the dummy 
#define either.

> +/*-------------------------------------------------------------------------
> + *
> + * Barriers.  Not in C11.  Apple currently lacks the POSIX version.
> + * We assume that the OS might know a better way to implement it that
> + * we do, so we only provide our own if we have to.
> + *
> + *-------------------------------------------------------------------------
> + */
> +
> +#ifdef PG_THREADS_WIN32
> +typedef SYNCHRONIZATION_BARRIER pg_barrier_t;
> +#elif defined(HAVE_PTHREAD_BARRIER)
> +typedef pthread_barrier_t pg_barrier_t;
> +#else
> +typedef struct pg_barrier_t
> +{
> +    bool        sense;
> +    int            expected;
> +    int            arrived;
> +    pg_mtx_t    mutex;
> +    pg_cnd_t    cond;
> +} pg_barrier_t;
> +#endif

I'd suggest calling this pg_thrd_barrier_t or even pg_thread_barrier_t. 
It's a little more verbose, but would rhyme with pthread_barrier_t. And 
to avoid confusing it with memory barriers and ProcSignal barriers.


Comments on the TSS implementation follow, which largely become moot if 
you give up on the destructor support on Windows:

> +void
> +pg_tss_dtor_delete(pg_tss_t tss_id)

Should this be pg_tss_delete(), since the C11 function is tss_delete()? 
Or is this different? There are actually no callers currently, so maybe 
just leave it out. pg_tss_dtor_delete() also seems racy, if run 
concurrently with pg_tss_run_destructors().

> +     * Make sure our destructor hook is registered with the operating system
> +     * in this process. This happens only once in the whole process.  Making
> +     * sure it will run actually in each thread happens in
> +     * pg_tss_ensure_destructors_will_run().

the function is called pg_tss_ensure_destructors_in_this_thread(), not 
pg_tss_ensure_destructors_will_run().

> +/*
> + * Called every time pg_tss_set() installs a non-NULL value.
> + */
> +void
> +pg_tss_ensure_destructors_in_this_thread(void)
> +{
> +    /*
> +     * Pairs with pg_tss_install_run_destructors(), called by pg_tss_create().
> +     * This makes sure that we know if the tss_id being set could possibly
> +     * have a destructor.  We don't want to pay the cost of checking, but we
> +     * can check with a simple load if *any* tss_id has a destructor.  If so,
> +     * we make sure that pg_tss_destructor_hook has a non-NULL value in *this*
> +     * thread, because both Windows and POSIX will only call a destructor for
> +     * a non-NULL value.
> +     */
> +    pg_read_barrier();
> +    if (pg_tss_run_destructors_installed)
> +    {
> +#ifdef PG_THREADS_WIN32
> +        if (FlsGetValue(pg_tss_destructor_hook) == NULL)
> +            FlsSetValue(pg_tss_destructor_hook, (void *) 1);
> +#else
> +        if (pthread_getspecific(pg_tss_destructor_hook) == NULL)
> +            pthread_setspecific(pg_tss_destructor_hook, (void *) 1);
> +#endif
> +    }
> +}
> +#endif

I believe this cannot get called if !pg_tss_run_destructors_installed, 
no need to check that. Because pg_tss_create() will fail
pg_tss_run_destructors is set. Maybe turn it into an Assert. Or 
alternatively, skip the call to pg_tss_install_run_destructors when 
pg_tss_create() is called with destructor==NULL. I think that may have 
been the idea here, based on the comment above.

The write barriers in pg_tss_install_run_destructors() seem excessive. 
The function will only ever run in one thread, protected by 
pg_call_once(). pg_call_once() surely provides all the necessary barriers.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Conflict detection and logging in logical replication
Next
From: Pavel Stehule
Date:
Subject: Re: [PATCH] Add CANONICAL option to xmlserialize