Re: Let's make PostgreSQL multi-threaded - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: Let's make PostgreSQL multi-threaded
Date
Msg-id CAHyXU0z_miQ8QiE+bOKJSd=0OPSsrAboJov6WbVyJ_V_B1RJWg@mail.gmail.com
Whole thread Raw
In response to Re: Let's make PostgreSQL multi-threaded  (David Geier <geidav.pg@gmail.com>)
Responses Re: Let's make PostgreSQL multi-threaded
List pgsql-hackers
On Thu, Jul 27, 2023 at 8:28 AM David Geier <geidav.pg@gmail.com> wrote:
Hi,

On 6/7/23 23:37, Andres Freund wrote:
> I think we're starting to hit quite a few limits related to the process model,
> particularly on bigger machines. The overhead of cross-process context
> switches is inherently higher than switching between threads in the same
> process - and my suspicion is that that overhead will continue to
> increase. Once you have a significant number of connections we end up spending
> a *lot* of time in TLB misses, and that's inherent to the process model,
> because you can't share the TLB across processes.

Another problem I haven't seen mentioned yet is the excessive kernel
memory usage because every process has its own set of page table entries
(PTEs). Without huge pages the amount of wasted memory can be huge if
shared buffers are big.

Hm, noted this upthread, but asking again, does this help/benefit interactions with the operating system make oom kill situations less likely?   These things are the bane of my existence, and I'm having a hard time finding a solution that prevents them other than running pgbouncer and lowering max_connections, which adds complexity.  I suspect I'm not the only one dealing with this.   What's really scary about these situations is they come without warning.  Here's a pretty typical example per sar -r.   

             kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
 14:20:02       461612  15803476     97.16         0  11120280  12346980     60.35  10017820   4806356       220 
 14:30:01       378244  15886844     97.67         0  11239012  12296276     60.10  10003540   4909180       240 
 14:40:01       308632  15956456     98.10         0  11329516  12295892     60.10  10015044   4981784       200 
 14:50:01       458956  15806132     97.18         0  11383484  12101652     59.15   9853612   5019916       112 
 15:00:01     10592736   5672352     34.87         0   4446852   8378324     40.95   1602532   3473020       264   <-- reboot!
 15:10:01      9151160   7113928     43.74         0   5298184   8968316     43.83   2714936   3725092       124 
 15:20:01      8629464   7635624     46.94         0   6016936   8777028     42.90   2881044   4102888       148 
 15:30:01      8467884   7797204     47.94         0   6285856   8653908     42.30   2830572   4323292       436 
 15:40:02      8077480   8187608     50.34         0   6828240   8482972     41.46   2885416   4671620       320 
 15:50:01      7683504   8581584     52.76         0   7226132   8511932     41.60   2998752   4958880       308 
 16:00:01      7239068   9026020     55.49         0   7649948   8496764     41.53   3032140   5358388       232 
 16:10:01      7030208   9234880     56.78         0   7899512   8461588     41.36   3108692   5492296       216 

Triggering query was heavy (maybe even runaway), server load was minimal otherwise:

                 CPU     %user     %nice   %system   %iowait    %steal     %idle
 14:30:01        all      9.55      0.00      0.63      0.02      0.00     89.81                                                                            
 14:40:01        all      9.95      0.00      0.69      0.02      0.00     89.33                                                                                       
 14:50:01        all     10.22      0.00      0.83      0.02      0.00     88.93                                                                                     
 15:00:01        all     10.62      0.00      1.63      0.76      0.00     86.99                                                                                    
 15:10:01        all      8.55      0.00      0.72      0.12      0.00     90.61


The conjecture here is that lots of idle connections make the server appear to have less memory available than it looks, and sudden transient demands can cause it to destabilize. 

Just throwing it out there, if it can be shown to help it may be supportive of moving forward with something like this, either instead of, or along with, O_DIRECT or other internalized database memory management strategies.  Lowering context switches, faster page access etc are of course nice would not be a game changer for the workloads we see which are pretty varied  (OLTP, analytics) although we don't extremely high transaction rates.

merlin


 

pgsql-hackers by date:

Previous
From: "Andrey M. Borodin"
Date:
Subject: Re: libpq compression (part 2)
Next
From: vignesh C
Date:
Subject: Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication