Re: Postgres with pthread - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | Re: Postgres with pthread |
Date | |
Msg-id | 169c12a9-adb3-6ff0-dda9-86822cb077c7@postgrespro.ru Whole thread Raw |
In response to | Re: Postgres with pthread (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: Postgres with pthread
Re: Postgres with pthread |
List | pgsql-hackers |
I want to thank everybody for feedbacks and a lot of useful notices. I am very pleased with interest of community to this topic and will continue research in this direction. Some more comments from my side: My original intention was to implement some king of built-in connection pooling for Postgres: be able to execute several transactions into one backend. It requires use of some kind lightweight multitasking (coroutines). The obvious candidate for it is libcore. In this case we also need to solve the problem with static variables. And __thread will not help in this case. We have to collect all static variables into some structure (context) and replace any references to such variable with indirection through pointer. It will be much harder to implement than annotating variable definitions with __thread: it will require change of all accesses to variables, so almost all Postgres code has to be refactored. Another problem with this approach is that we need asynchronous disk IO for it. Unfortunately this is no good file AIO implementation for Linux. Certainly we can spawn dedicated IO thread (or threads) and queue IO requests to it. But such architecture seems to become quite complex. Also cooperative multitasking itself is not able to load all CPU cores. So we need to have several physical processes/threads which will execute coroutines. In theory such architecture should provide the best performance and scalability (handle hundreds of thousands of client connections). But in practice there are a lot of pitfals: 1. Right now each backend has its local relation, catalog and prepared statement caches. For large database this caches can be large enough: several megabytes. So such coroutines becomes really not "lightweight". The obvious solution is to have global caches or combine global and local caches. But it once again requires significant changes in postgres. 2. Large number of sessions makes current approach with procarray almost unusable: we need to provide some alternative implementation of snapshots, for example CSN based. 3. All locking mechanisms have to be rewritten. So this approach almost exclude possibility of evolution of existed postgres code base and requires "revolution": rewriting most of Postgres components from scratch and refactoring almost all other postgres code. This is why I have to abandon move in this direction. Replacing processes with threads can be considered just as first step and requires changes in many postgres components if we really want to get significant advantages from it. But at least such work can be splitted into several phases and it is possible for some time to support both multithreaded and multiprocess model in the same codebase. Below I want to summarize the most important (from my point of view) arguments pro/contra multithreaded I got from your feedbacks: Pros: 1. Simplified memory model: no need in DSM, shm_mq, DSA, etc 2. Efficient integration of PLs supporting multithreaded execution, first of all Java 3. Less memory footprint, faster context switching, more efficient use of TLB Contras: 1. Breaks compatibility with existed extensions and adds more requirements for authors of new extension 2. Problems with integration of single-threaded PLs: Python, Lua,... 3. Worser protection from programming errors, included errors in extensions. 4. Lack of explicit separation of shared and privite memory leads to more synchronization errors. Right now in Postgres there is strict distinction between shared memory and private memory, so it is clear for programmer whether (s)he is working with shared data and so need some kind of synchronization to avoid race condition. In pthreads all memory is shared and more care is needed to work with it. So pthreads can help to increase scalability, but still do not help much in implementation of built-in connection pooling, autonomous transactions,... Current 50% improvement of select speed for large number of connections certainly can not be considered as enough motivation for such radical changes of Postgres architecture. But it is just first step and much more benefits can be obtained by adopting Postgres to this model. It is hard to me to estimate now all complexity of switching to thread model and all advantages we can get from it. First of all I am going to repeat my benchmarks at SMP computers with large number of cores (so that 100 or more active backends can be really useful even in case of connection pooling). -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
pgsql-hackers by date: