Re: Let's make PostgreSQL multi-threaded - Mailing list pgsql-hackers
From | Tristan Partin |
---|---|
Subject | Re: Let's make PostgreSQL multi-threaded |
Date | |
Msg-id | CT4TNFJFKOY3.22XP8JDT7U0C5@gonk Whole thread Raw |
In response to | Let's make PostgreSQL multi-threaded (Heikki Linnakangas <hlinnaka@iki.fi>) |
Responses |
Re: Let's make PostgreSQL multi-threaded
|
List | pgsql-hackers |
On Mon Jun 5, 2023 at 9:51 AM CDT, Heikki Linnakangas wrote: > # Global variables > > We have a lot of global and static variables: > > $ objdump -t bin/postgres | grep -e "\.data" -e "\.bss" | grep -v > "data.rel.ro" | wc -l > 1666 > > Some of them are pointers to shared memory structures and can stay as > they are. But many of them are per-connection state. The most > straightforward conversion for those is to turn them into thread-local > variables, like Konstantin did in [0]. > > It might be good to have some kind of a Session context struct that we > pass everywhere, or maybe have a single thread-local variable to hold > it. Many of the global variables would become fields in the Session. But > that's future work. +1 to the session context idea after the more simple thread_local storage idea. > # Extensions > > A lot of extensions also contain global variables or other things that > break in a multi-threaded environment. We need a way to label extensions > that support multi-threading. And in the future, also extensions that > *require* a multi-threaded server. > > Let's add flags to the control file to mark if the extension is > thread-safe and/or process-safe. If you try to load an extension that's > not compatible with the server's mode, throw an error. > > We might need new functions in addition _PG_init, called at connection > startup and shutdown. And background worker API probably needs some changes. It would be a good idea to start exposing a variable through pkg-config to tell whether the backend is multi-threaded or multi-process. > # Exposed PIDs > > We expose backend process PIDs to users in a few places. > pg_stat_activity.pid and pg_terminate_backend(), for example. They need > to be replaced, or we can assign a fake PID to each connection when > running in multi-threaded mode. Would it be possible to just transparently slot in the thread ID instead? > # Thread-safe libraries > > Need to switch to thread-safe versions of library functions, e.g. > uselocale() instead of setlocale(). Seems like a good starting point. > The Python interpreter has a Global Interpreter Lock. It's not possible > to create two completely independent Python interpreters in the same > process, there will be some lock contention on the GIL. Fortunately, the > python community just accepted https://peps.python.org/pep-0684/. That's > exactly what we need: it makes it possible for separate interpreters to > have their own GILs. It's not clear to me if that's in Python 3.12 > already, or under development for some future version, but by the time > we make the switch in Postgres, there probably will be a solution in > cpython. 3.12 is the currently in-development version of Python. 3.12 is planned for release in October of this year. A workaround that some projects seem to do is to use multiple Python interpreters[0], though it seems uncommon. It might be important to note depending on the minimum version of Python Postgres aims to support (not sure on this policy). The C-API of Python also provides mechanisms for releasing the GIL. I am not familiar with how Postgres uses Python, but I have seen huge improvements to performance with well-placed GIL releases in multi-threaded contexts. Surely this API would just become a no-op after the PEP is implemented. [0]: https://peps.python.org/pep-0684/#existing-use-of-multiple-interpreters -- Tristan Partin Neon (https://neon.tech)
pgsql-hackers by date: