Thread: Postgres with pthread

Postgres with pthread

From
Konstantin Knizhnik
Date:
Hi hackers,

As far as I remember, several years ago when implementation of intra-query parallelism was just started there was discussion whether to use threads or leave traditional Postgres process architecture. The decision was made to leave processes. So now we have bgworkers, shared message queue, DSM, ...
The main argument for such decision was that switching to threads will require rewriting of most of Postgres code.
It seems to be quit reasonable argument and and until now I agreed with it.

But recently I wanted to check it myself.
The first problem with porting Postgres to pthreads is static variables widely used in Postgres code.
Most of modern compilers support thread local variables, for example GCC provides __thread keyword.
Such variables are placed in separate segment which is address through segment register (at Intel).
So access time to such variables is the same as to normal static variables.

Certainly may be not all compilers have builtin support of TLS and may be not at all hardware platforms them are implemented ias efficiently as at Intel.
So certainly such approach decreases portability of Postgres. But IMHO it is not so critical.

What I have done:
1. Add session_local (defined as __thread) to definition of most of static and global variables.
I leaved some variables pointed to shared memory as static. Also I have to changed initialization of some static variables,
because address of TLS variable can not be used in static initializers.
2. Change implementation of GUCs to make them thread specific.
3. Replace fork() with pthread_create
4. Rewrite file descriptor cache to be global (shared by all threads).

I have not changed all Postgres synchronization primitives and shared memory.
It took me about one week of work.

What is  not done yet:
1. Handling of signals (I expect that Win32 code can be somehow reused here).
2. Deallocation of memory and closing files on backend (thread) termination.
3. Interaction of postmaster and backends with PostgreSQL auxiliary processes (threads), such as autovacuum, bgwriter, checkpointer, stat collector,...

What are the advantages of using threads instead of processes?

1. No need to use shared memory. So there is no static limit for amount of memory which can be used by Postgres. No need in distributed shared memory and other stuff designed to share memory between backends and bgworkers.
2. Threads significantly simplify implementation of parallel algorithms: interaction and transferring data between threads can be done easily and more efficiently.
3. It is possible to use more efficient/lightweight synchronization primitives. Postgres now mostly relies on its own low level sync.primitives which user-level implementation
is using spinlocks and atomics and then fallback to OS semaphores/poll. I am not sure how much gain can we get by replacing this primitives with one optimized for threads.
My colleague from Firebird community told me that just replacing processes with threads can obtain 20% increase of performance, but it is just first step and replacing sync. primitive
can give much greater advantage. But may be for Postgres with its low level primitives it is not true.
4. Threads are more lightweight entities than processes. Context switch between threads takes less time than between process. And them consume less memory. It is usually possible to spawn more threads than processes.
5. More efficient access to virtual memory. As far as all threads are sharing the same memory space, TLB is used much efficiently in this case.
6. Faster backend startup. Certainly starting backend at each user's request is bad thing in any case. Some kind of connection pooling should be used in any case to provide acceptable performance. But in any case, start of new backend process in postgres causes a lot of page faults which have dramatical impact on performance. And there is no such problem with threads.

Certainly, processes are also having some advantages comparing with threads:
1. Better isolation and error protection
2. Easier error handling
3. Easier control of used resources

But it is a theory. The main idea of this prototype was to prove or disprove this expectation at practice.
I didn't expect large differences in performance because synchronization primitives are not changed and I performed my experiments at Linux where threads/processes are implemented in similar way.

Below are some results (1000xTPS) of select-only (-S) pgbench with scale 100 at my desktop with quad-core i7-4770 3.40GHz and 16Gb of RAM:

Connections    Vanilla/default       Vanilla/prepared   pthreads/default
     pthreads/prepared
10                    100                        191                       106                         207
100                  67                          131                       105                         168
1000                41                          65                         55                           102

As you can see, for small number of connection results are almost similar. But for large number of connection pthreads provide less degradation.

You can look at my prototype here:
https://github.com/postgrespro/postgresql.pthreads.git

But please notice that it is very raw prototype. A lot of stuff is not working yet. And supporting all of exited Postgres functionality requires
much more efforts (and even more efforts are needed for optimizing Postgres for this architecture).

I just want to receive some feedback and know if community is interested in any further work in this direction.


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: Postgres with pthread

From
Tom Lane
Date:
Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
> Below are some results (1000xTPS) of select-only (-S) pgbench with scale
> 100 at my desktop with quad-core i7-4770 3.40GHz and 16Gb of RAM:

> Connections    Vanilla/default       Vanilla/prepared
> pthreads/defaultpthreads/prepared
> 10                    100 191                      
> 106                         207
> 100                  67 131                      
> 105                         168
> 1000                41 65                        
> 55                           102

This table is so mangled that I'm not very sure what it's saying.
Maybe you should have made it an attachment?

However, if I guess at which numbers are supposed to be what,
it looks like even the best case is barely a 50% speedup.
That would be worth pursuing if it were reasonably low-hanging
fruit, but converting PG to threads seems very far from being that.

I think you've done us a very substantial service by pursuing
this far enough to get some quantifiable performance results.
But now that we have some results in hand, I think we're best
off sticking with the architecture we've got.

            regards, tom lane


Re: Postgres with pthread

From
Adam Brusselback
Date:
Here it is formatted a little better.



So a little over 50% performance improvement for a couple of the test cases.



On Wed, Dec 6, 2017 at 11:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
> Below are some results (1000xTPS) of select-only (-S) pgbench with scale
> 100 at my desktop with quad-core i7-4770 3.40GHz and 16Gb of RAM:

> Connections    Vanilla/default       Vanilla/prepared
> pthreads/defaultpthreads/prepared
> 10                    100 191                      
> 106                         207
> 100                  67 131                      
> 105                         168
> 1000                41 65                        
> 55                           102

This table is so mangled that I'm not very sure what it's saying.
Maybe you should have made it an attachment?

However, if I guess at which numbers are supposed to be what,
it looks like even the best case is barely a 50% speedup.
That would be worth pursuing if it were reasonably low-hanging
fruit, but converting PG to threads seems very far from being that.

I think you've done us a very substantial service by pursuing
this far enough to get some quantifiable performance results.
But now that we have some results in hand, I think we're best
off sticking with the architecture we've got.

                        regards, tom lane


Re: Postgres with pthread

From
Andres Freund
Date:
Hi!

On 2017-12-06 19:40:00 +0300, Konstantin Knizhnik wrote:
> As far as I remember, several years ago when implementation of intra-query
> parallelism was just started there was discussion whether to use threads or
> leave traditional Postgres process architecture. The decision was made to
> leave processes. So now we have bgworkers, shared message queue, DSM, ...
> The main argument for such decision was that switching to threads will
> require rewriting of most of Postgres code.

> It seems to be quit reasonable argument and and until now I agreed with it.
> 
> But recently I wanted to check it myself.

I think that's something pretty important to play with. There've been
several discussions lately, both on and off list / in person, that we're
taking on more-and-more technical debt just because we're using
processes. Besides the above, we've grown:
- a shared memory allocator
- a shared memory hashtable
- weird looking thread aware pointers
- significant added complexity in various projects due to addresses not
  being mapped to the same address etc.


> The first problem with porting Postgres to pthreads is static variables
> widely used in Postgres code.
> Most of modern compilers support thread local variables, for example GCC
> provides __thread keyword.
> Such variables are placed in separate segment which is address through
> segment register (at Intel).
> So access time to such variables is the same as to normal static variables.

I experimented similarly. Although I'm not 100% sure that if were to go
for it, we wouldn't instead want to abstract our session concept
further, or well, at all.


> Certainly may be not all compilers have builtin support of TLS and may be
> not at all hardware platforms them are implemented ias efficiently as at
> Intel.
> So certainly such approach decreases portability of Postgres. But IMHO it is
> not so critical.

I'd agree there, but I don't think the project necessarily does.


> What I have done:
> 1. Add session_local (defined as __thread) to definition of most of static
> and global variables.
> I leaved some variables pointed to shared memory as static. Also I have to
> changed initialization of some static variables,
> because address of TLS variable can not be used in static initializers.
> 2. Change implementation of GUCs to make them thread specific.
> 3. Replace fork() with pthread_create
> 4. Rewrite file descriptor cache to be global (shared by all threads).

That one I'm very unconvinced of, that's going to add a ton of new
contention.


> What are the advantages of using threads instead of processes?
> 
> 1. No need to use shared memory. So there is no static limit for amount of
> memory which can be used by Postgres. No need in distributed shared memory
> and other stuff designed to share memory between backends and
> bgworkers.

This imo is the biggest part. We can stop duplicating OS and our own
implementations in a shmem aware way.


> 2. Threads significantly simplify implementation of parallel algorithms:
> interaction and transferring data between threads can be done easily and
> more efficiently.

That's imo the same as 1.


> 3. It is possible to use more efficient/lightweight synchronization
> primitives. Postgres now mostly relies on its own low level sync.primitives
> which user-level implementation
> is using spinlocks and atomics and then fallback to OS semaphores/poll. I am
> not sure how much gain can we get by replacing this primitives with one
> optimized for threads.
> My colleague from Firebird community told me that just replacing processes
> with threads can obtain 20% increase of performance, but it is just first
> step and replacing sync. primitive
> can give much greater advantage. But may be for Postgres with its low level
> primitives it is not true.

I don't believe that that's actually the case to any significant degree.


> 6. Faster backend startup. Certainly starting backend at each user's request
> is bad thing in any case. Some kind of connection pooling should be used in
> any case to provide acceptable performance. But in any case, start of new
> backend process in postgres causes a lot of page faults which have
> dramatical impact on performance. And there is no such problem with threads.

I don't buy this in itself. The connection establishment overhead isn't
largely the fork, it's all the work afterwards. I do think it makes
connection pooling etc easier.


> I just want to receive some feedback and know if community is interested in
> any further work in this direction.

I personally am. I think it's beyond high time that we move to take
advantage of threads.

That said, I don't think just replacing threads is the right thing. I'm
pretty sure we'd still want to have postmaster as a separate process,
for robustness. Possibly we even want to continue having various
processes around besides that, the most interesting cases involving
threads are around intra-query parallelism, and pooling, and for both a
hybrid model could be beneficial.

I think that we probably initially want some optional move to
threads. Most extensions won't initially be thread ready, and imo we
should continue to work with that for a while, just refusing to use
parallelism if any loaded shared library doesn't signal parallelism
support. We also don't necessarily want to require threads on all
platforms at the same time.

I think the biggest problem with doing this for real is that it's a huge
project, and that it'll take a long time.

Thanks for working on this!

Andres Freund


Re: Postgres with pthread

From
Robert Haas
Date:
On Wed, Dec 6, 2017 at 11:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> barely a 50% speedup.

I think that's an awfully strange choice of adverb.  This is, by its
authors own admission, a rough cut at this, probably with very little
of the optimization that could ultimately done, and it's already
buying 50% on some test cases?  That sounds phenomenally good to me.
A 50% speedup is huge, and chances are that it can be made quite a bit
better with more work, or that it already is quite a bit better with
the right test case.

TBH, based on previous discussion, I expected this to initially be
*slower* but still worthwhile in the long run because of optimizations
that it would let us do eventually with parallel query and other
things.  If it's this much faster out of the gate, that's really
exciting.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Postgres with pthread

From
Andres Freund
Date:
Hi,

On 2017-12-06 11:53:21 -0500, Tom Lane wrote:
> Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
> However, if I guess at which numbers are supposed to be what,
> it looks like even the best case is barely a 50% speedup.

"barely a 50% speedup" - Hah. I don't believe the numbers, but that'd be
huge.


> That would be worth pursuing if it were reasonably low-hanging
> fruit, but converting PG to threads seems very far from being that.

I don't think immediate performance gains are the interesting part about
using threads. It's rather what their absence adds a lot in existing /
submitted code complexity, and makes some very commonly requested
features a lot harder to implement:

- we've a lot of duplicated infrastructure around dynamic shared
  memory. dsm.c dsa.c, dshash.c etc. A lot of these, especially dsa.c,
  are going to become a lot more complicated over time, just look at how
  complicated good multi threaded allocators are.

- we're adding a lot of slowness to parallelism, just because we have
  different memory layouts in different processes. Instead of just
  passing pointers through queues, we put entire tuples in there. We
  deal with dsm aware pointers.

- a lot of features have been a lot harder (parallelism!), and a lot of
  frequently requested ones are so hard due to processes that they never
  got off ground (in-core pooling, process reuse, parallel worker reuse)

- due to the statically sized shared memory a lot of our configuration
  is pretty fundamentally PGC_POSTMASTER, even though that present a lot
  of administrative problems.

...


> I think you've done us a very substantial service by pursuing
> this far enough to get some quantifiable performance results.
> But now that we have some results in hand, I think we're best
> off sticking with the architecture we've got.

I don't agree.

I'd personally expect that an immediate conversion would result in very
little speedup, a bunch of code deleted, a bunch of complexity
added. And it'd still be massively worthwhile, to keep medium to long
term complexity and feature viability in control.

Greetings,

Andres Freund


Re: Postgres with pthread

From
Adam Brusselback
Date:
> "barely a 50% speedup" - Hah. I don't believe the numbers, but that'd be
> huge.
They are numbers derived from a benchmark that any sane person would
be using a connection pool for in a production environment, but
impressive if true none the less.


Re: Postgres with pthread

From
Andreas Karlsson
Date:
On 12/06/2017 06:08 PM, Andres Freund wrote:
> I think the biggest problem with doing this for real is that it's a huge
> project, and that it'll take a long time.

An additional issue is that this could break a lot of extensions and in 
a way that it is not apparent at compile time. This means we may need to 
break all extensions to force extensions authors to check if they are 
thread safe.

I do not like making life hard for out extension community, but if the 
gains are big enough it might be worth it.

> Thanks for working on this!

Seconded.

Andreas


Re: Postgres with pthread

From
Robert Haas
Date:
On Wed, Dec 6, 2017 at 12:08 PM, Andres Freund <andres@anarazel.de> wrote:
>> 4. Rewrite file descriptor cache to be global (shared by all threads).
>
> That one I'm very unconvinced of, that's going to add a ton of new
> contention.

It might be OK on systems where we can use pread()/pwrite().
Otherwise it's going to be terrible.

> That said, I don't think just replacing threads is the right thing. I'm
> pretty sure we'd still want to have postmaster as a separate process,
> for robustness.

+1.  The tendency of the postmaster to not die has been a huge boon to
the reliability of PostgreSQL - I would not like to give that up.
MySQL ends up needing safe_mysqld to cope with this issue; our idea of
having it built into the server is better.

> Possibly we even want to continue having various
> processes around besides that, the most interesting cases involving
> threads are around intra-query parallelism, and pooling, and for both a
> hybrid model could be beneficial.

I think if we only use threads for intra-query parallelism we're
leaving a lot of money on the table.  For example, if all
shmem-connected backends are using the same process, then we can make
max_locks_per_transaction PGC_SIGHUP.  That would be sweet, and there
are probably plenty of similar things.  Moreover, if threads are this
thing that we only use now and then for parallel query, then our
support for them will probably have bugs.  If we use them all the
time, we'll actually find the bugs and fix them.  I hope.

> I think the biggest problem with doing this for real is that it's a huge
> project, and that it'll take a long time.

+1

> Thanks for working on this!

+1

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Postgres with pthread

From
Andres Freund
Date:
Hi,

On 2017-12-06 12:28:29 -0500, Robert Haas wrote:
> > Possibly we even want to continue having various
> > processes around besides that, the most interesting cases involving
> > threads are around intra-query parallelism, and pooling, and for both a
> > hybrid model could be beneficial.
> 
> I think if we only use threads for intra-query parallelism we're
> leaving a lot of money on the table.  For example, if all
> shmem-connected backends are using the same process, then we can make
> max_locks_per_transaction PGC_SIGHUP.  That would be sweet, and there
> are probably plenty of similar things.  Moreover, if threads are this
> thing that we only use now and then for parallel query, then our
> support for them will probably have bugs.  If we use them all the
> time, we'll actually find the bugs and fix them.  I hope.

I think it'd make a lot of sense to go there gradually. I agree that we
probably want to move to more and more use of threads, but we also want
our users not to kill us ;). Initially we'd surely continue to use
partitioned dynahash for locks, which'd make resizing infeasible
anyway. Similar for shared buffers (which I find a hell of a lot more
interesting to change at runtime than max_locks_per_transaction), etc...

- Andres


Re: Postgres with pthread

From
Thomas Munro
Date:
On Thu, Dec 7, 2017 at 6:08 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-12-06 19:40:00 +0300, Konstantin Knizhnik wrote:
>> As far as I remember, several years ago when implementation of intra-query
>> parallelism was just started there was discussion whether to use threads or
>> leave traditional Postgres process architecture. The decision was made to
>> leave processes. So now we have bgworkers, shared message queue, DSM, ...
>> The main argument for such decision was that switching to threads will
>> require rewriting of most of Postgres code.
>
>> It seems to be quit reasonable argument and and until now I agreed with it.
>>
>> But recently I wanted to check it myself.
>
> I think that's something pretty important to play with. There've been
> several discussions lately, both on and off list / in person, that we're
> taking on more-and-more technical debt just because we're using
> processes. Besides the above, we've grown:
> - a shared memory allocator
> - a shared memory hashtable
> - weird looking thread aware pointers
> - significant added complexity in various projects due to addresses not
>   being mapped to the same address etc.

Yes, those are all workarounds for an ancient temporary design choice.
To quote from a 1989 paper[1] "Currently, POSTGRES runs as one process
for each active user. This was done as an expedient to get a system
operational as quickly as possible. We plan on converting POSTGRES to
use lightweight processes [...]".  +1 for sticking to the plan.

While personally contributing to the technical debt items listed
above, I always imagined that all that machinery could become
compile-time options controlled with --with-threads and
dsa_get_address() would melt away leaving only a raw pointers, and
dsa_area would forward to the MemoryContext + ResourceOwner APIs, or
something like that.  It's unfortunate that we lose type safety along
the way though.  (If only there were some way we could write
dsa_pointer<my_type>.  In fact it was also a goal of the original
project to adopt C++, based on a comment in 4.2's nodes.h: "Eventually
this code should be transmogrified into C++ classes, and this is more
or less compatible with those things.")

If there were a good way to reserve (but not map) a large address
range before forking, there could also be an intermediate build mode
that keeps the multi-process model but where DSA behaves as above,
which might an interesting way to decouple the
DSA-go-faster-and-reduce-tech-debt project from the threading project.
We could manage the reserved address space ourselves and map DSM
segments with MAP_FIXED, so dsa_get_address() address decoding could
be compiled away.  One way would be to mmap a huge range backed with
/dev/zero, and then map-with-MAP_FIXED segments over the top of it and
then remap /dev/zero back into place when finished, but that sucks
because it gives you that whole mapping in your core files and relies
on overcommit which we don't like, hence my interest in a way to
reserve but not map.

>> The first problem with porting Postgres to pthreads is static variables
>> widely used in Postgres code.
>> Most of modern compilers support thread local variables, for example GCC
>> provides __thread keyword.
>> Such variables are placed in separate segment which is address through
>> segment register (at Intel).
>> So access time to such variables is the same as to normal static variables.
>
> I experimented similarly. Although I'm not 100% sure that if were to go
> for it, we wouldn't instead want to abstract our session concept
> further, or well, at all.

Using a ton of thread local variables may be a useful stepping stone,
but if we want to be able to separate threads/processes from sessions
eventually then I guess we'll want to model sessions as first class
objects and pass them around explicitly or using a single TLS variable
current_session.

> I think the biggest problem with doing this for real is that it's a huge
> project, and that it'll take a long time.
>
> Thanks for working on this!

+1

[1] http://db.cs.berkeley.edu/papers/ERL-M90-34.pdf

-- 
Thomas Munro
http://www.enterprisedb.com


Re: Postgres with pthread

From
Craig Ringer
Date:
On 7 December 2017 at 01:17, Andres Freund <andres@anarazel.de> wrote:
 

> I think you've done us a very substantial service by pursuing
> this far enough to get some quantifiable performance results.
> But now that we have some results in hand, I think we're best
> off sticking with the architecture we've got.

I don't agree.

I'd personally expect that an immediate conversion would result in very
little speedup, a bunch of code deleted, a bunch of complexity
added. And it'd still be massively worthwhile, to keep medium to long
term complexity and feature viability in control.

Personally I think it's a pity we didn't land up here before the foundations for parallel query went in - DSM, shm_mq, DSA, etc. I know the EDB folks at least looked into it though, and presumably there were good reasons to go in this direction. Maybe that was just "community will never accept threaded conversion" at the time, though.

Now we have quite a lot of homebrew infrastructure to consider if we do a conversion.

That said, it might in some ways make it easier. shm_mq, for example, would likely convert to a threaded backend with minimal changes to callers, and probably only limited changes to shm_mq its self. So maybe these abstractions will prove to have been a win in some ways. Except DSA, and even then it could serve as a transitional API...

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Postgres with pthread

From
Craig Ringer
Date:
On 7 December 2017 at 05:58, Thomas Munro <thomas.munro@enterprisedb.com> wrote:

Using a ton of thread local variables may be a useful stepping stone,
but if we want to be able to separate threads/processes from sessions
eventually then I guess we'll want to model sessions as first class
objects and pass them around explicitly or using a single TLS variable
current_session.


Yep.

This is the real reason I'm excited by the idea of a threading conversion.

PostgreSQL's architecture conflates "connection", "session" and "executor" into one somewhat muddled mess. I'd love to be able to untangle that to the point where we can pool executors amongst active queries, while retaining idle sessions' state properly even while they're in a transaction.

Yeah, that's a long way off, but it'd be a whole lot more practical if we didn't have to serialize and deserialize the entire session state to do it.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

RE: Postgres with pthread

From
"Tsunakawa, Takayuki"
Date:
From: Craig Ringer [mailto:craig@2ndquadrant.com]
>     I'd personally expect that an immediate conversion would result
> in very
>     little speedup, a bunch of code deleted, a bunch of complexity
>     added. And it'd still be massively worthwhile, to keep medium to
> long
>     term complexity and feature viability in control.

+1
I hope for things like:

* More performance statistics like system-wide LWLock waits, without the concern about fixed shared memory size
* Dynamic memory sizing, such as shared_buffers, work_mem, maintenance_work_mem
* Running multi-threaded components in postgres extension (is it really safe to run JVM for PL/Java in a
single-threadedpostgres?)
 

Regards
Takayuki Tsunakawa



Re: Postgres with pthread

From
Craig Ringer
Date:
On 7 December 2017 at 11:44, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:
From: Craig Ringer [mailto:craig@2ndquadrant.com]
>       I'd personally expect that an immediate conversion would result
> in very
>       little speedup, a bunch of code deleted, a bunch of complexity
>       added. And it'd still be massively worthwhile, to keep medium to
> long
>       term complexity and feature viability in control.

+1
I hope for things like:


* More performance statistics like system-wide LWLock waits, without the concern about fixed shared memory size
* Dynamic memory sizing, such as shared_buffers, work_mem, maintenance_work_mem

I'm not sure how threaded operations would help us much there. If we could split shared_buffers into extents we could do this with something like dsm already. Without the ability to split it into extents, we can't do it with locally malloc'd memory in a threaded system either.

Re performance diagnostics though, you can already get a lot of useful data from PostgreSQL's SDT tracepoints, which are usable with perf and DTrace amongst other tools. Dynamic userspace 'perf' probes can tell you a lot too.

I'm confident you could collect some seriously useful data with perf tracepoints and 'perf script' these days. (BTW, I extended the https://wiki.postgresql.org/wiki/Profiling_with_perf article a bit yesterday with some tips on this).

Of course better built-in diagnostics would be nice. But I really don't see how it'd have much to do with threaded vs forked model of execution; we can allocate chunks of memory with dsm now, after all.
 
* Running multi-threaded components in postgres extension (is it really safe to run JVM for PL/Java in a single-threaded postgres?)

PL/Java is a giant mess for so many more reasons than that. The JVM is a heavyweight  startup, lightweight thread model system. It doesn't play at all well with postgres's lightweight process fork()-based CoW model. You can't fork() the JVM because fork() doesn't play nice with threads, at all. So you have to start it in each backend individually, which is just awful.

One of the nice things if Pg got a threaded model would be that you could embed a JVM, Mono/.NET runtime, etc and have your sessions work together in ways you cannot currently sensibly do. Folks using MS SQL, Oracle, etc are pretty used to being able to do this, and while it should be done with caution it can offer huge benefits for some complex workloads. 

Right now if a PostgreSQL user wants to do anything involving IPC, shared data, etc, we pretty much have to write quite complex C extensions to do it.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Postgres with pthread

From
Simon Riggs
Date:
> But it is a theory. The main idea of this prototype was to prove or disprove
> this expectation at practice.

> But please notice that it is very raw prototype. A lot of stuff is not
> working yet.

> And supporting all of exited Postgres functionality requires
> much more efforts (and even more efforts are needed for optimizing Postgres
> for this architecture).
>
> I just want to receive some feedback and know if community is interested in
> any further work in this direction.

Looks good. You are right, it is a theory. If your prototype does
actually show what we think it does then it is a good and interesting
result.

I think we need careful analysis to show where these exact gains come
from. The actual benefit is likely not evenly distributed across the
list of possible benefits. Did they arise because you produced a
stripped down version of Postgres? Or did they arise from using
threads?

It would not be the first time a result shown in protoype did not show
real gains on a completed project.

I might also read your results to show that connection concentrators
would be a better area of work, since 100 connections perform better
than 1000 in both cases, so why bother optimising for 1000 connections
at all? In which case we should read the benefit at the 100
connections line, where it shows the lower 28% gain, closer to the
gain your colleague reported.

So I think we don't yet have enough to make a decision.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Postgres with pthread

From
Konstantin Knizhnik
Date:
I want to thank everybody for feedbacks and a lot of useful notices.
I am very pleased with interest of community to this topic and will 
continue research in this direction.
Some more comments from my side:

My original intention was to implement some king of built-in connection 
pooling for Postgres: be able to execute several transactions into one 
backend.
It requires use of some kind lightweight multitasking (coroutines). The 
obvious candidate for it is libcore.

In this case we also need to solve the problem with static variables. 
And __thread will not help in this case. We have to collect all static 
variables into some structure (context)
and replace any references to such variable with indirection through 
pointer. It will be much harder to implement than annotating variable 
definitions with __thread:
it will require change of all accesses to variables, so almost all 
Postgres code has to be refactored.

Another problem with this approach is that we need asynchronous disk IO 
for it. Unfortunately this is no good file AIO implementation for Linux.
Certainly we can spawn dedicated IO thread (or threads)  and queue IO 
requests to it. But such architecture seems to become quite complex.

Also cooperative multitasking itself is not able to load all CPU cores. 
So we need to have several physical processes/threads which will execute 
coroutines.
In theory such architecture should provide the best performance and 
scalability (handle hundreds of thousands of client connections). But in 
practice there are a lot of pitfals:
1. Right now each backend has its local relation, catalog and prepared 
statement caches. For large database this caches can be large enough: 
several megabytes.
So such coroutines becomes really not "lightweight". The  obvious 
solution is to have global caches or combine global and local caches. 
But it once again requires significant
changes in postgres.
2. Large number of sessions makes current approach with procarray almost 
unusable: we need to provide some alternative implementation of 
snapshots, for example CSN based.
3. All locking mechanisms have to be rewritten.

So this approach almost exclude possibility of evolution of existed 
postgres code base and requires "revolution": rewriting most of Postgres 
components from scratch and refactoring  almost all other postgres code.
This is why I have to abandon move in this direction.

Replacing processes with threads can be considered just as first step 
and requires changes in many postgres components if we really want to 
get significant advantages from it.
But at least such work can be splitted into several phases and it is 
possible for some time to support both multithreaded and multiprocess 
model in the same codebase.
Below I want to summarize the most important (from my point of view) 
arguments pro/contra multithreaded I got from your feedbacks:

Pros:
1. Simplified memory model: no need in DSM, shm_mq, DSA, etc
2. Efficient integration of PLs supporting multithreaded execution, 
first of all Java
3. Less memory footprint, faster context switching, more efficient use 
of TLB

Contras:
1. Breaks compatibility with existed extensions and adds more 
requirements for authors of new extension
2. Problems with integration of single-threaded PLs: Python, Lua,...
3. Worser protection from programming errors, included errors in extensions.
4. Lack of explicit separation of shared and privite memory leads to 
more synchronization errors.
Right now in Postgres there is strict distinction between shared memory 
and private memory, so it is clear for programmer
whether (s)he is working with shared data and so need some kind of 
synchronization to avoid race condition.
In pthreads all memory is shared and more care is needed to work with it.

So pthreads can help to increase scalability, but still do not help much 
in implementation of built-in connection pooling, autonomous 
transactions,...

Current 50% improvement of select speed for large number of connections 
certainly can not be considered as enough motivation for such radical 
changes of Postgres architecture.
But it is just first step and much more benefits can be obtained by 
adopting Postgres to this model.
It is hard to me to estimate now all complexity of switching to thread 
model and all advantages we can get from it.
First of all I am going to repeat my benchmarks at SMP computers with 
large number of cores (so that 100 or more active backends can be really 
useful even in case of connection pooling).


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Postgres with pthread

From
Konstantin Knizhnik
Date:
Hi

On 06.12.2017 20:08, Andres Freund wrote:
>
> 4. Rewrite file descriptor cache to be global (shared by all threads).
> That one I'm very unconvinced of, that's going to add a ton of new
> contention.

Do you mean lock contention because of mutex I used to synchronize 
access to shared file descriptor cache
or contention for file descriptors?
Right now each thread has its own virtual file descriptors, so them are 
not shared between threads.
But there is common LRU, restricting total number of opened descriptors 
in the process.

Actually I have not other choice if I want to support thousands of 
connection.
If each thread has its own private descriptor cache (as it is now for 
processes) and its size is estimated base on open file quota,
then there will be millions of opened file descriptors.

Concerning contention for mutex, I do not think that it is a problem.
At least I have to say that performance (with 100 connections) is 
significantly improved and shows almost the same speed as for 10 
connections
after I have rewritten file descriptor can and made it global
(my original implementation just made all fd.c static variables as 
thread local, so each thread has its separate pool).

It is possible to go further and shared file descriptors between threads 
and use pwrite/pread instead of seek+read/write.
But we still need mutex to implement LRU l2list and free handler list.


Re: Postgres with pthread

From
Konstantin Knizhnik
Date:

On 07.12.2017 00:58, Thomas Munro wrote:
> Using a ton of thread local variables may be a useful stepping stone,
> but if we want to be able to separate threads/processes from sessions
> eventually then I guess we'll want to model sessions as first class
> objects and pass them around explicitly or using a single TLS variable
> current_session.
>
It was my primary intention.
Unfortunately separating all static variables into some kind of session 
context requires much more efforts:
we have to change all accesses to such variables.

But please notice, that from performance point of view access to 
__thread variables is not more expensive then access to static variable or
access to fields of session context structure through current_session.  
And there is no extra space overhead for them.

-- 

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Postgres with pthread

From
Craig Ringer
Date:
On 7 December 2017 at 19:55, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:

Pros:
1. Simplified memory model: no need in DSM, shm_mq, DSA, etc

shm_mq would remain useful, and the others could only be dropped if you also dropped process-model support entirely.
  
1. Breaks compatibility with existed extensions and adds more requirements for authors of new extension

Depends on how much frightening preprocessor magic you're willing to use, doesn't it? ;)

Wouldn't be surprised if simple extensions (C functions etc) stayed fairly happy, but it'd be hazardous enough in terms of library use etc that deliberate breakage may be beter.
 
2. Problems with integration of single-threaded PLs: Python, Lua,...

Yeah, that's going to hurt. Especially since most non-plpgsql code out there will be plperl and plpython. Breaking that's not going to be an option, but nobody's going to be happy if all postgres backends must contend for the same Python GIL. Plus it'd be deadlock-city.

That's nearly a showstopper right there. Especially since with a quick look around it looks like the cPython GIL is per-DLL (at least on Windows) not per-interpreter-state, so spawning separate interpreter states per-thread may not be sufficient. That makes sense given that cPython its self is thread-aware; otherwise it'd have a really hard time figuring out which GIL and interpreter state to look at when in a cPython-spawned thread.
 
3. Worser protection from programming errors, included errors in extensions.

Mainly contaminating memory of unrelated procesess, or the postmaster.

I'm not worried about outright crashes. On any modern system it's not significantly worse to take down the postmaster than it is to have it do its own recovery. A modern init will restart it promptly. (If you're not running postgres under an init daemon for production then... well, you should be.)
 
4. Lack of explicit separation of shared and privite memory leads to more synchronization errors.

Accidentally clobbering postmaster memory/state would be my main worry there.

Right now we gain a lot of protection from our copy-on-write shared-nothing-by-default model, and we rely on it in quite a lot of places where backends merrily stomp on inherited postmaster state.

The more I think about it, the less enthusiastic I am, really. 
 
--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Postgres with pthread

From
Robert Haas
Date:
On Wed, Dec 6, 2017 at 10:20 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> Personally I think it's a pity we didn't land up here before the foundations
> for parallel query went in - DSM, shm_mq, DSA, etc. I know the EDB folks at
> least looked into it though, and presumably there were good reasons to go in
> this direction. Maybe that was just "community will never accept threaded
> conversion" at the time, though.

Yep.  Never is a long time, but it took 3 release cycles to get a
user-visible feature as it was, and if I'd tried to insist on a
process->thread conversion first I suspect we'd still be stuck on that
point today.  Perhaps we would have gotten as far as getting that much
done, but that wouldn't make parallel query be done on top of it.

> Now we have quite a lot of homebrew infrastructure to consider if we do a
> conversion.
>
> That said, it might in some ways make it easier. shm_mq, for example, would
> likely convert to a threaded backend with minimal changes to callers, and
> probably only limited changes to shm_mq its self. So maybe these
> abstractions will prove to have been a win in some ways. Except DSA, and
> even then it could serve as a transitional API...

Yeah, I don't feel too bad about what we've built.  Even if it
ultimately goes away, it will have served the useful purpose of
proving that parallel query is a good idea and can work.  Besides,
shm_mq is just a ring buffer for messages; that's not automatically
something that we don't want just because we move to threads.  If it
goes away, which I think not unlikely, it'll be because something else
is faster.

Also, it's not as if only parallel query structures might have been
designed differently if we had been using threads all along.
dynahash, for example, is quite unlike most concurrent hash tables and
a big part of the reason is that it has to cope with being situated in
a fixed-size chunk of shared memory.  More generally, the whole reason
there's no cheap, straightforward palloc_shared() is the result of the
current design, and it seems very unlikely we wouldn't have that quite
apart from parallel query.  Install pg_stat_statements without a
server restart?  Yes, please.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Postgres with pthread

From
Andres Freund
Date:
On 2017-12-07 11:26:07 +0800, Craig Ringer wrote:
> PostgreSQL's architecture conflates "connection", "session" and "executor"
> into one somewhat muddled mess.

How is the executor entangled in the other two?

Greetings,

Andres Freund


Re: Postgres with pthread

From
Greg Stark
Date:
On 7 December 2017 at 19:58, Andres Freund <andres@anarazel.de> wrote:
> On 2017-12-07 11:26:07 +0800, Craig Ringer wrote:
>> PostgreSQL's architecture conflates "connection", "session" and "executor"
>> into one somewhat muddled mess.
>
> How is the executor entangled in the other two?

I was going to ask the same question. AFAICS it's the one part of
Postgres that isn't muddled at all -- it's crystal clear that
"connection" == "session" as far as the backend is concerned and
"executor context" is completely separate.

But then I thought about it a bit and I do wonder. I don't know how
well we test having multiple portals doing all kinds of different
query plans with their execution interleaved. And I definitely have
doubts whether you can start SPI sessions from arbitrary points in the
executor expression evaluation and don't know what state you can leave
and resume them from on subsequent evaluations...



-- 
greg


Re: Postgres with pthread

From
Andres Freund
Date:
Hi,

On 2017-12-07 20:48:06 +0000, Greg Stark wrote:
> But then I thought about it a bit and I do wonder. I don't know how
> well we test having multiple portals doing all kinds of different
> query plans with their execution interleaved.

Cursors test that pretty well.


> And I definitely have doubts whether you can start SPI sessions from
> arbitrary points in the executor expression evaluation and don't know
> what state you can leave and resume them from on subsequent
> evaluations...

SPI being weird doesn't really have that much bearing on the executor
structure imo. But I'm unclear what you'd use SPI for that really
necessitates that. We don't suspend execution it the middle of function
execution...

Greetings,

Andres Freund


Re: Postgres with pthread

From
Craig Ringer
Date:
On 8 December 2017 at 03:58, Andres Freund <andres@anarazel.de> wrote:
On 2017-12-07 11:26:07 +0800, Craig Ringer wrote:
> PostgreSQL's architecture conflates "connection", "session" and "executor"
> into one somewhat muddled mess.

How is the executor entangled in the other two?


Executor in the postgres sense isn't, so I chose the word poorly.

"Engine of execution" maybe. What I'm getting at is that we tie up more resources than should ideally be necessary when a session is idle, especially idle in transaction. But I guess a lot of that is really down to memory allocated and not returned to the OS (because like other C programs we can't do that), etc. The key resources like PGXACT entries aren't something we can release while idle in a transaction after all. 

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Postgres with pthread

From
konstantin knizhnik
Date:

On Dec 7, 2017, at 10:41 AM, Simon Riggs wrote:

But it is a theory. The main idea of this prototype was to prove or disprove
this expectation at practice.

But please notice that it is very raw prototype. A lot of stuff is not
working yet.

And supporting all of exited Postgres functionality requires
much more efforts (and even more efforts are needed for optimizing Postgres
for this architecture).

I just want to receive some feedback and know if community is interested in
any further work in this direction.

Looks good. You are right, it is a theory. If your prototype does
actually show what we think it does then it is a good and interesting
result.

I think we need careful analysis to show where these exact gains come
from. The actual benefit is likely not evenly distributed across the
list of possible benefits. Did they arise because you produced a
stripped down version of Postgres? Or did they arise from using
threads?

It would not be the first time a result shown in protoype did not show
real gains on a completed project.

I might also read your results to show that connection concentrators
would be a better area of work, since 100 connections perform better
than 1000 in both cases, so why bother optimising for 1000 connections
at all? In which case we should read the benefit at the 100
connections line, where it shows the lower 28% gain, closer to the
gain your colleague reported.

So I think we don't yet have enough to make a decision.


Concerning optimal number of connection: one of my intentions was to eliminate meed in external connection pool (pgbouncer&Co).
In this case applications can use prepared statements which itself provides two times increase of performance.
I  believe that threads have smaller footprint than processes, to it is possible to spawn more threads and directly access them without intermediate layer with connection pooling.


I have performed experiments at more power server: 
144 virtual cores Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz.

Here results of read-only queries are different: both pthreads and vanilla version shows almost the same speed both for 100 and 1000 connections: about 1300k TPS
with prepared statement. So there is no performance degradation with increased number of connections and no larger difference between processes and threads.

But at read-write workload (pgbench -N) there is still significant advantage of pthreads version (kTPS):


Connections
Vanilla
pthreads
100
165
154
1000
85
118


For some reasons (which I do not know yet) multiprocess version of postgres is slightly faster for 100 connections, 
but degrades almost twice for 1000 connections, while degradation of multithreads version is not so large.

By the way, pthreads version make it possible to much easily check whats going on using gdb (manual "profiling") :


thread apply all bt
Thread 997 (Thread 0x7f6e08810700 (LWP 61345)):
#0  0x00007f7e03263576 in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1  0x00007f7e03263668 in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x0000000000698552 in PGSemaphoreLock ()
#3  0x0000000000702804 in LWLockAcquire ()
#4  0x00000000004f9ac4 in XLogInsertRecord ()
#5  0x0000000000503b97 in XLogInsert ()
#6  0x00000000004bb0d1 in log_heap_clean ()
#7  0x00000000004bd7c8 in heap_page_prune ()
#8  0x00000000004bd9c1 in heap_page_prune_opt ()
---Type <return> to continue, or q <return> to quit---
#9  0x00000000004c43d4 in index_fetch_heap ()
#10 0x00000000004c4410 in index_getnext ()
#11 0x00000000006037d2 in IndexNext ()
#12 0x00000000005f3a80 in ExecScan ()
#13 0x0000000000609eba in ExecModifyTable ()
#14 0x00000000005ed6fa in standard_ExecutorRun ()
#15 0x0000000000713622 in ProcessQuery ()
#16 0x0000000000713885 in PortalRunMulti ()
#17 0x00000000007143a5 in PortalRun ()
#18 0x0000000000711cf1 in PostgresMain ()
#19 0x00000000006a708b in backend_main_proc ()
#20 0x00007f7e0325a36d in start_thread () from /lib64/libpthread.so.0
#21 0x00007f7e02870b8f in clone () from /lib64/libc.so.6

Thread 996 (Thread 0x7f6e08891700 (LWP 61344)):
#0  0x00007f7e03263576 in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1  0x00007f7e03263668 in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x0000000000698552 in PGSemaphoreLock ()
#3  0x0000000000702804 in LWLockAcquire ()
#4  0x00000000004bc862 in RelationGetBufferForTuple ()
#5  0x00000000004b60db in heap_insert ()
#6  0x000000000060ad3b in ExecModifyTable ()
#7  0x00000000005ed6fa in standard_ExecutorRun ()
#8  0x0000000000713622 in ProcessQuery ()
#9  0x0000000000713885 in PortalRunMulti ()
#10 0x00000000007143a5 in PortalRun ()
#11 0x0000000000711cf1 in PostgresMain ()
#12 0x00000000006a708b in backend_main_proc ()
#13 0x00007f7e0325a36d in start_thread () from /lib64/libpthread.so.0
#14 0x00007f7e02870b8f in clone () from /lib64/libc.so.6

Thread 995 (Thread 0x7f6e08912700 (LWP 61343)):
#0  0x00007f7e03263576 in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1  0x00007f7e03263668 in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x0000000000698552 in PGSemaphoreLock ()
#3  0x00000000006f1dad in ProcArrayEndTransaction ()
#4  0x00000000004efae0 in CommitTransaction ()
#5  0x00000000004f0bd5 in CommitTransactionCommand ()
#6  0x000000000070e9cf in finish_xact_command ()
#7  0x0000000000711d13 in PostgresMain ()
#8  0x00000000006a708b in backend_main_proc ()
#9  0x00007f7e0325a36d in start_thread () from /lib64/libpthread.so.0
#10 0x00007f7e02870b8f in clone () from /lib64/libc.so.6

---Type <return> to continue, or q <return> to quit---
Thread 994 (Thread 0x7f6e08993700 (LWP 61342)):
#0  0x00007f7e03263576 in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1  0x00007f7e03263668 in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x0000000000698552 in PGSemaphoreLock ()
#3  0x00000000006f1dad in ProcArrayEndTransaction ()
#4  0x00000000004efae0 in CommitTransaction ()
#5  0x00000000004f0bd5 in CommitTransactionCommand ()
#6  0x000000000070e9cf in finish_xact_command ()
#7  0x0000000000711d13 in PostgresMain ()
#8  0x00000000006a708b in backend_main_proc ()
#9  0x00007f7e0325a36d in start_thread () from /lib64/libpthread.so.0
#10 0x00007f7e02870b8f in clone () from /lib64/libc.so.6

Thread 993 (Thread 0x7f6e08a14700 (LWP 61341)):
#0  0x00007f7e03263576 in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1  0x00007f7e03263668 in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x0000000000698552 in PGSemaphoreLock ()
#3  0x00000000006f1dad in ProcArrayEndTransaction ()
#4  0x00000000004efae0 in CommitTransaction ()
#5  0x00000000004f0bd5 in CommitTransactionCommand ()
#6  0x000000000070e9cf in finish_xact_command ()
#7  0x0000000000711d13 in PostgresMain ()
#8  0x00000000006a708b in backend_main_proc ()
#9  0x00007f7e0325a36d in start_thread () from /lib64/libpthread.so.0
#10 0x00007f7e02870b8f in clone () from /lib64/libc.so.6

Thread 992 (Thread 0x7f6e08a95700 (LWP 61340)):
#0  0x00007f7e03263576 in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1  0x00007f7e03263668 in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x0000000000698552 in PGSemaphoreLock ()
#3  0x00000000006f1dad in ProcArrayEndTransaction ()
#4  0x00000000004efae0 in CommitTransaction ()
#5  0x00000000004f0bd5 in CommitTransactionCommand ()
#6  0x000000000070e9cf in finish_xact_command ()
#7  0x0000000000711d13 in PostgresMain ()
#8  0x00000000006a708b in backend_main_proc ()
#9  0x00007f7e0325a36d in start_thread () from /lib64/libpthread.so.0
#10 0x00007f7e02870b8f in clone () from /lib64/libc.so.6

Thread 991 (Thread 0x7f6e08b16700 (LWP 61339)):
#0  0x00007f7e03263576 in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1  0x00007f7e03263668 in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x0000000000698552 in PGSemaphoreLock ()
#3  0x00000000006f1dad in ProcArrayEndTransaction ()
---Type <return> to continue, or q <return> to quit---
#4  0x00000000004efae0 in CommitTransaction ()
#5  0x00000000004f0bd5 in CommitTransactionCommand ()
#6  0x000000000070e9cf in finish_xact_command ()
#7  0x0000000000711d13 in PostgresMain ()
#8  0x00000000006a708b in backend_main_proc ()
#9  0x00007f7e0325a36d in start_thread () from /lib64/libpthread.so.0
#10 0x00007f7e02870b8f in clone () from /lib64/libc.so.6
....

I am not going to show stack traces of all 1000 threads.
But you may notice that proc array lock really seems be be a bottleneck.

Re: Postgres with pthread

From
Alexander Korotkov
Date:
On Sat, Dec 9, 2017 at 1:09 AM, konstantin knizhnik <k.knizhnik@postgrespro.ru> wrote:
I am not going to show stack traces of all 1000 threads.
But you may notice that proc array lock really seems be be a bottleneck.

Yes, proc array lock easily becomes a bottleneck on multicore machine with large number of connections.  Related to this, another patch helping to large number of connections is CSN.  When our snapshot model was invented, xip was just array of few elements, and that cause no problem.  Now, we're considering threads to help us handling thousands of connections.  Snapshot with thousands of xips looks ridiculous.  Collecting such a large snapshot could be more expensive than single index lookup.

These two patches threads and CSN are both complicated and require hard work during multiple release cycles to get committed.  But I really hope that their cumulative effect can dramatically improve situation on high number of connections.  There are already some promising benchmarks in CSN thread.  I wonder if we already can do some cumulative benchmarks of threads + CSN?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Postgres with pthread

From
Konstantin Knizhnik
Date:
I continue experiments with my pthread prototype.
Latest results are the following:

1. I have eliminated all (I hope) calls of non-reentrant functions 
(getopt, setlocale, setitimer, localtime, ...). So now parallel tests 
are passed.

2. I have implemented deallocation of top memory context (at thread 
exit) and cleanup of all opened file descriptors.
I have to replace several place where malloc is used with top_malloc: 
allocation in top context.

3. Now my prototype is passing all regression tests now. But handling of 
errors is still far from completion.

4. I have performed experiments with replacing synchronization 
primitives used in Postgres with pthread analogues.
Unfortunately it has almost now influence on performance.

5. Handling large number of connections.
The maximal number of postgres connections is almost the same: 100k.
But memory footprint in case of pthreads was significantly smaller: 18Gb 
vs 38Gb.
And difference in performance was much higher: 60k TPS vs . 600k TPS.
Compare it with performance for 10k clients: 1300k TPS.
It is read-only pgbench -S test with 1000 connections.
As far as pgbench doesn't allow to specify more than 1000 clients, I 
spawned several instances of pgbench.

Why handling large number of connections is important?
It allows applications to access postgres directly, not using pgbouncer 
or any other external connection pooling tool.
In this case an application can use prepared statements which can reduce 
speed of simple queries almost twice.

Unfortunately Postgres sessions are not lightweight. Each backend 
maintains its private catalog and relation caches, prepared statement 
cache,...
For real database size of this caches in memory will be several 
megabytes and warming this caches can take significant amount of time.
So if we really want to support large number of connections, we should 
rewrite caches to be global (shared).
It will allow to save a lot of memory but add synchronization overhead. 
Also at NUMA private caches may be more efficient than one global cache.

My proptotype can be found at: 
git://github.com/postgrespro/postgresql.pthreads.git


-- 

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Postgres with pthread

From
Pavel Stehule
Date:


2017-12-21 14:25 GMT+01:00 Konstantin Knizhnik <k.knizhnik@postgrespro.ru>:
I continue experiments with my pthread prototype.
Latest results are the following:

1. I have eliminated all (I hope) calls of non-reentrant functions (getopt, setlocale, setitimer, localtime, ...). So now parallel tests are passed.

2. I have implemented deallocation of top memory context (at thread exit) and cleanup of all opened file descriptors.
I have to replace several place where malloc is used with top_malloc: allocation in top context.

3. Now my prototype is passing all regression tests now. But handling of errors is still far from completion.

4. I have performed experiments with replacing synchronization primitives used in Postgres with pthread analogues.
Unfortunately it has almost now influence on performance.

5. Handling large number of connections.
The maximal number of postgres connections is almost the same: 100k.
But memory footprint in case of pthreads was significantly smaller: 18Gb vs 38Gb.
And difference in performance was much higher: 60k TPS vs . 600k TPS.
Compare it with performance for 10k clients: 1300k TPS.
It is read-only pgbench -S test with 1000 connections.
As far as pgbench doesn't allow to specify more than 1000 clients, I spawned several instances of pgbench.

Why handling large number of connections is important?
It allows applications to access postgres directly, not using pgbouncer or any other external connection pooling tool.
In this case an application can use prepared statements which can reduce speed of simple queries almost twice.

What I know MySQL has not good experience with high number of threads - and there is thread pool in enterprise (and now in Mariadb0 versions.

Regards

Pavel


Unfortunately Postgres sessions are not lightweight. Each backend maintains its private catalog and relation caches, prepared statement cache,...
For real database size of this caches in memory will be several megabytes and warming this caches can take significant amount of time.
So if we really want to support large number of connections, we should rewrite caches to be global (shared).
It will allow to save a lot of memory but add synchronization overhead. Also at NUMA private caches may be more efficient than one global cache.

My proptotype can be found at: git://github.com/postgrespro/postgresql.pthreads.git


--

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Postgres with pthread

From
Konstantin Knizhnik
Date:


On 21.12.2017 16:25, Konstantin Knizhnik wrote:
I continue experiments with my pthread prototype.
Latest results are the following:

1. I have eliminated all (I hope) calls of non-reentrant functions (getopt, setlocale, setitimer, localtime, ...). So now parallel tests are passed.

2. I have implemented deallocation of top memory context (at thread exit) and cleanup of all opened file descriptors.
I have to replace several place where malloc is used with top_malloc: allocation in top context.

3. Now my prototype is passing all regression tests now. But handling of errors is still far from completion.

4. I have performed experiments with replacing synchronization primitives used in Postgres with pthread analogues.
Unfortunately it has almost now influence on performance.

5. Handling large number of connections.
The maximal number of postgres connections is almost the same: 100k.
But memory footprint in case of pthreads was significantly smaller: 18Gb vs 38Gb.
And difference in performance was much higher: 60k TPS vs . 600k TPS.
Compare it with performance for 10k clients: 1300k TPS.
It is read-only pgbench -S test with 1000 connections.
As far as pgbench doesn't allow to specify more than 1000 clients, I spawned several instances of pgbench.

Why handling large number of connections is important?
It allows applications to access postgres directly, not using pgbouncer or any other external connection pooling tool.
In this case an application can use prepared statements which can reduce speed of simple queries almost twice.

Unfortunately Postgres sessions are not lightweight. Each backend maintains its private catalog and relation caches, prepared statement cache,...
For real database size of this caches in memory will be several megabytes and warming this caches can take significant amount of time.
So if we really want to support large number of connections, we should rewrite caches to be global (shared).
It will allow to save a lot of memory but add synchronization overhead. Also at NUMA private caches may be more efficient than one global cache.

My proptotype can be found at: git://github.com/postgrespro/postgresql.pthreads.git


Finally I managed to run Postgres with 100k active connections.
Not sure that this result can pretend for been mentioned in Guiness records, but I am almost sure that nobody has done it before (at least with original version of Postgres).
But it was really "Pyrrhic victory". Performance for 100k connections is 1000 times slower than for 10k. All threads are blocked in semaphores.
This is more or less expected result, but still scale of degradation is impressive:


#Connections
TPS
100k
550
10k
558k
6k
745k
4k
882k
2k
1100k
1k
1300k


As it is clear from this stacktraces, shared catalog and statement cache are highly needed to provide good performance with such large number of active backends:


(gdb) thread apply all bt

Thread 17807 (LWP 660863):
#0  0x00007f4c1cb46576 in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1  0x00007f4c1cb46668 in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x0000000000697a32 in PGSemaphoreLock ()
#3  0x0000000000702a64 in LWLockAcquire ()
#4  0x00000000006fbf2d in LockAcquireExtended ()
#5  0x00000000006f9fa3 in LockRelationOid ()
#6  0x00000000004b2ffd in relation_open ()
#7  0x00000000004b31d6 in heap_open ()
#8  0x00000000007f1ed1 in CatalogCacheInitializeCache ()
#9  0x00000000007f3835 in SearchCatCache1 ()
#10 0x0000000000800510 in get_tablespace ()
#11 0x00000000008006e1 in get_tablespace_page_costs ()
#12 0x000000000065a4e1 in cost_seqscan ()
#13 0x000000000068bf92 in create_seqscan_path ()
#14 0x00000000006568b4 in set_rel_pathlist ()
#15 0x0000000000656eb8 in make_one_rel ()
#16 0x00000000006740d0 in query_planner ()
#17 0x0000000000676526 in grouping_planner ()
#18 0x0000000000679812 in subquery_planner ()
#19 0x000000000067a66c in standard_planner ()
#20 0x000000000070ffe1 in pg_plan_query ()
#21 0x00000000007100b6 in pg_plan_queries ()
#22 0x00000000007f6c6f in BuildCachedPlan ()
#23 0x00000000007f6e5c in GetCachedPlan ()
#24 0x0000000000711ccf in PostgresMain ()
#25 0x00000000006a5535 in backend_main_proc ()
#26 0x00000000006a353d in thread_trampoline ()
#27 0x00007f4c1cb3d36d in start_thread () from /lib64/libpthread.so.0
#28 0x00007f4c1c153b8f in clone () from /lib64/libc.so.6

Thread 17806 (LWP 660861):
#0  0x00007f4c1cb46576 in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1  0x00007f4c1cb46668 in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x0000000000697a32 in PGSemaphoreLock ()
#3  0x0000000000702a64 in LWLockAcquire ()
#4  0x00000000006fbf2d in LockAcquireExtended ()
#5  0x00000000006f9fa3 in LockRelationOid ()
#6  0x00000000004b2ffd in relation_open ()
#7  0x00000000004b31d6 in heap_open ()
#8  0x00000000007f1ed1 in CatalogCacheInitializeCache ()
#9  0x00000000007f3835 in SearchCatCache1 ()
#10 0x0000000000800510 in get_tablespace ()
#11 0x00000000008006e1 in get_tablespace_page_costs ()
#12 0x000000000065a4e1 in cost_seqscan ()
#13 0x000000000068bf92 in create_seqscan_path ()
#14 0x00000000006568b4 in set_rel_pathlist ()
#15 0x0000000000656eb8 in make_one_rel ()
#16 0x00000000006740d0 in query_planner ()
#17 0x0000000000676526 in grouping_planner ()
#18 0x0000000000679812 in subquery_planner ()
#19 0x000000000067a66c in standard_planner ()
#20 0x000000000070ffe1 in pg_plan_query ()
#21 0x00000000007100b6 in pg_plan_queries ()
#22 0x00000000007f6c6f in BuildCachedPlan ()
---Type <return> to continue, or q <return> to quit---
#23 0x00000000007f6e5c in GetCachedPlan ()
#24 0x0000000000711ccf in PostgresMain ()
#25 0x00000000006a5535 in backend_main_proc ()
#26 0x00000000006a353d in thread_trampoline ()
#27 0x00007f4c1cb3d36d in start_thread () from /lib64/libpthread.so.0
#28 0x00007f4c1c153b8f in clone () from /lib64/libc.so.6

Thread 17805 (LWP 660856):
#0  0x00007f4c1cb46576 in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1  0x00007f4c1cb46668 in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x0000000000697a32 in PGSemaphoreLock ()
#3  0x0000000000702a64 in LWLockAcquire ()
#4  0x00000000006fcb1c in LockRelease ()
#5  0x00000000006fa059 in UnlockRelationId ()
#6  0x00000000004b31c5 in relation_close ()
#7  0x00000000007f2e86 in SearchCatCacheMiss ()
#8  0x00000000007f37fd in SearchCatCache1 ()
#9  0x0000000000800510 in get_tablespace ()
#10 0x00000000008006e1 in get_tablespace_page_costs ()
#11 0x000000000065a4e1 in cost_seqscan ()
#12 0x000000000068bf92 in create_seqscan_path ()
#13 0x00000000006568b4 in set_rel_pathlist ()
#14 0x0000000000656eb8 in make_one_rel ()
#15 0x00000000006740d0 in query_planner ()
#16 0x0000000000676526 in grouping_planner ()
#17 0x0000000000679812 in subquery_planner ()
#18 0x000000000067a66c in standard_planner ()
#19 0x000000000070ffe1 in pg_plan_query ()
#20 0x00000000007100b6 in pg_plan_queries ()
#21 0x00000000007f6c6f in BuildCachedPlan ()
#22 0x00000000007f6e5c in GetCachedPlan ()
#23 0x0000000000711ccf in PostgresMain ()
#24 0x00000000006a5535 in backend_main_proc ()
#25 0x00000000006a353d in thread_trampoline ()
#26 0x00007f4c1cb3d36d in start_thread () from /lib64/libpthread.so.0
#27 0x00007f4c1c153b8f in clone () from /lib64/libc.so.6
...

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: Postgres with pthread

From
james
Date:
 > All threads are blocked in semaphores.
That they are blocked is inevitable - I guess the issue is that they are 
thrashing.
I guess it would be necessary to separate the internals to have some 
internal queueing and effectively reduce the number of actively 
executing threads.
In effect make the connection pooling work internally.

Would it be possible to make the caches have persistent (functional) 
data structures - effectively CoW?

And how easy would it be to abort if the master view had subsequently 
changed when it comes to execution?



Re: Postgres with pthread

From
Andres Freund
Date:

On December 27, 2017 11:05:52 AM GMT+01:00, james <james@mansionfamily.plus.com> wrote:
> > All threads are blocked in semaphores.
>That they are blocked is inevitable - I guess the issue is that they
>are
>thrashing.
>I guess it would be necessary to separate the internals to have some
>internal queueing and effectively reduce the number of actively
>executing threads.
>In effect make the connection pooling work internally.
>
>Would it be possible to make the caches have persistent (functional)
>data structures - effectively CoW?
>
>And how easy would it be to abort if the master view had subsequently
>changed when it comes to execution?

Optimizing for this seems like a pointless exercise. If the goal is efficient processing of 100k connections the
solutionis a session / connection abstraction and a scheduler.   Optimizing for this amount of concurrency just will
addcomplexity and slowdowns for a workload that nobody will run. 

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: Postgres with pthread

From
james
Date:
On 27/12/2017 10:08, Andres Freund wrote:
> Optimizing for this seems like a pointless exercise. If the goal is efficient processing of 100k connections the
solutionis a session / connection abstraction and a scheduler.   Optimizing for this amount of concurrency just will
addcomplexity and slowdowns for a workload that nobody will run.
 

Isn't that what's suggested?
The difference is that the session scheduler is inside the server.
Accepting 100k connections is no problem these days - giving each of 
them an active thread seems to be the issue.



Re: Postgres with pthread

From
Konstantin Knizhnik
Date:

On 27.12.2017 13:08, Andres Freund wrote:
>
> On December 27, 2017 11:05:52 AM GMT+01:00, james <james@mansionfamily.plus.com> wrote:
>>> All threads are blocked in semaphores.
>> That they are blocked is inevitable - I guess the issue is that they
>> are
>> thrashing.
>> I guess it would be necessary to separate the internals to have some
>> internal queueing and effectively reduce the number of actively
>> executing threads.
>> In effect make the connection pooling work internally.
>>
>> Would it be possible to make the caches have persistent (functional)
>> data structures - effectively CoW?
>>
>> And how easy would it be to abort if the master view had subsequently
>> changed when it comes to execution?
> Optimizing for this seems like a pointless exercise. If the goal is efficient processing of 100k connections the
solutionis a session / connection abstraction and a scheduler.   Optimizing for this amount of concurrency just will
addcomplexity and slowdowns for a workload that nobody will run.
 
I agree with you that supporting 100k active connections has not so much 
practical sense now.
But there are many systems with hundreds of cores and to utilize them we 
still need spawn thousands of backends.
In this case Postgres snaphots and local caches becomes inefficient.
Switching to CSN allows to somehow solve the problem with snapshots.
But the problems with private caches should also be addressed: it seems 
to be very stupid to perform the same work 1000x times and maintain 
1000x copies.
Also, in case of global prepared statements, presence of global cache 
allows to spend more time in plan optimization use manual tuning.

Switching to pthreads model significantly simplify  development of 
shared caches: there are no problems with statically allocated shared 
address space or dynamic segments mapped on different address, not 
allowing to use normal pointer. Also invalidation of shared cache is 
easier: on need to send invalidation notifications to all backends.
But still it requires a lot of work. For example catalog cache is 
tightly integrated with resource owner's information.
Also shared cache requires synchronization and this synchronization 
itself can become a bottleneck.

> Andres

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company