Re: Postgres with pthread - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: Postgres with pthread
Date
Msg-id 8c9212eb-cb6f-1cfd-9fce-84ec01390b20@postgrespro.ru
Whole thread Raw
In response to Re: Postgres with pthread  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Responses Re: Postgres with pthread
Re: Postgres with pthread
List pgsql-hackers
I continue experiments with my pthread prototype.
Latest results are the following:

1. I have eliminated all (I hope) calls of non-reentrant functions 
(getopt, setlocale, setitimer, localtime, ...). So now parallel tests 
are passed.

2. I have implemented deallocation of top memory context (at thread 
exit) and cleanup of all opened file descriptors.
I have to replace several place where malloc is used with top_malloc: 
allocation in top context.

3. Now my prototype is passing all regression tests now. But handling of 
errors is still far from completion.

4. I have performed experiments with replacing synchronization 
primitives used in Postgres with pthread analogues.
Unfortunately it has almost now influence on performance.

5. Handling large number of connections.
The maximal number of postgres connections is almost the same: 100k.
But memory footprint in case of pthreads was significantly smaller: 18Gb 
vs 38Gb.
And difference in performance was much higher: 60k TPS vs . 600k TPS.
Compare it with performance for 10k clients: 1300k TPS.
It is read-only pgbench -S test with 1000 connections.
As far as pgbench doesn't allow to specify more than 1000 clients, I 
spawned several instances of pgbench.

Why handling large number of connections is important?
It allows applications to access postgres directly, not using pgbouncer 
or any other external connection pooling tool.
In this case an application can use prepared statements which can reduce 
speed of simple queries almost twice.

Unfortunately Postgres sessions are not lightweight. Each backend 
maintains its private catalog and relation caches, prepared statement 
cache,...
For real database size of this caches in memory will be several 
megabytes and warming this caches can take significant amount of time.
So if we really want to support large number of connections, we should 
rewrite caches to be global (shared).
It will allow to save a lot of memory but add synchronization overhead. 
Also at NUMA private caches may be more efficient than one global cache.

My proptotype can be found at: 
git://github.com/postgrespro/postgresql.pthreads.git


-- 

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Reproducible builds: genbki.pl and Gen_fmgrtab.pl
Next
From: Michael Paquier
Date:
Subject: Re: Reproducible builds: genbki.pl and Gen_fmgrtab.pl