experimental: replace s_lock spinlock code with pthread_mutex on linux - Mailing list pgsql-hackers
From | Nils Goroll |
---|---|
Subject | experimental: replace s_lock spinlock code with pthread_mutex on linux |
Date | |
Msg-id | 4FEA3EA7.1040104@schokola.de Whole thread Raw |
In response to | Re: why roll-your-own s_lock? / improving scalability (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: experimental: replace s_lock spinlock code with
pthread_mutex on linux
Re: experimental: replace s_lock spinlock code with pthread_mutex on linux |
List | pgsql-hackers |
> It's > still unproven whether it'd be an improvement, but you could expect to > prove it one way or the other with a well-defined amount of testing. I've hacked the code to use adaptive pthread mutexes instead of spinlocks. see attached patch. The patch is for the git head, but it can easily be applied for 9.1.3, which is what I did for my tests. This had disastrous effects on Solaris because it does not use anything similar to futexes for PTHREAD_PROCESS_SHARED mutexes (only the _PRIVATE mutexes do without syscalls for the simple case). But I was surprised to see that it works relatively well on linux. Here's a glimpse of my results: hacked code 9.1.3: -bash-4.1$ rsync -av --delete /tmp/test_template_data/ ../data/ ; /usr/bin/time ./postgres -D ../data -p 55502 & ppid=$! ; pid=$(pgrep -P $ppid ) ; sleep 15 ; ./pgbench -c 768 -t 20 -j 128 -p 55502 postgres ; kill $pid sending incremental file list ... ransaction type: TPC-B (sort of) scaling factor: 10 query mode: simple number of clients: 768 number of threads: 128 number of transactions per client: 20 number of transactions actually processed: 15360/15360 tps = 476.873261 (including connections establishing) tps = 485.964355 (excluding connections establishing) LOG: received smart shutdown request LOG: autovacuum launcher shutting down -bash-4.1$ LOG: shutting down LOG: database system is shut down 210.58user 78.88system 0:50.64elapsed 571%CPU (0avgtext+0avgdata 1995968maxresident)k 0inputs+1153872outputs (0major+2464649minor)pagefaults 0swaps original code (vanilla build on amd64) 9.1.3: -bash-4.1$ rsync -av --delete /tmp/test_template_data/ ../data/ ; /usr/bin/time ./postgres -D ../data -p 55502 & ppid=$! ; pid=$(pgrep -P $ppid ) ; sleep 15 ; ./pgbench -c 768 -t 20 -j 128 -p 55502 postgres ; kill $pid sending incremental file list ... transaction type: TPC-B (sort of) scaling factor: 10 query mode: simple number of clients: 768 number of threads: 128 number of transactions per client: 20 number of transactions actually processed: 15360/15360 tps = 499.993685 (including connections establishing) tps = 510.410883 (excluding connections establishing) LOG: received smart shutdown request -bash-4.1$ LOG: autovacuum launcher shutting down LOG: shutting down LOG: database system is shut down 196.21user 71.38system 0:47.99elapsed 557%CPU (0avgtext+0avgdata 1360800maxresident)k 0inputs+1147904outputs (0major+2375965minor)pagefaults 0swaps config: -bash-4.1$ egrep '^[a-z]' /tmp/test_template_data/postgresql.conf max_connections = 1800 # (change requires restart) shared_buffers = 10GB # min 128kB temp_buffers = 64MB # min 800kB work_mem = 256MB # min 64kB,d efault 1MB maintenance_work_mem = 2GB # min 1MB, default 16MB bgwriter_delay = 10ms # 10-10000ms between rounds bgwriter_lru_maxpages = 1000 # 0-1000 max buffers written/round bgwriter_lru_multiplier = 10.0 # 0-10.0 multipler on buffers scanned/round wal_level = hot_standby # minimal, archive, or hot_standby wal_buffers = 64MB # min 32kB, -1 sets based on shared_buffers commit_delay = 10000 # range 0-100000, in microseconds datestyle = 'iso, mdy' lc_messages = 'en_US.UTF-8' # locale for system error message lc_monetary = 'en_US.UTF-8' # locale for monetary formatting lc_numeric = 'en_US.UTF-8' # locale for number formatting lc_time = 'en_US.UTF-8' # locale for time formatting default_text_search_config = 'pg_catalog.english' seq_page_cost = 1.0 # measured on an arbitrary scale random_page_cost = 1.5 # same scale as above (default: 4.0) cpu_tuple_cost = 0.005 cpu_index_tuple_cost = 0.0025 cpu_operator_cost = 0.0001 effective_cache_size = 192GB So it looks like using pthread_mutexes could at least be an option on Linux. Using futexes directly could be even cheaper. As a side note, it looks like I have not expressed myself clearly: I did not intend to suggest to replace proven, working code (which probably is the best you can get for some platforms) with posix calls. I apologize for the provocative question. Regarding the actual production issue, I did not manage to synthetically provoke the saturation we are seeing in production using pgbench - I could not even get anywhere near the production load. So I cannot currently test if reducing the amount of spinning and waking up exactly one waiter (which is what linux/nptl pthread_mutex_unlock does) would solve/mitigate the production issue I am working on, and I'd highly appreciate any pointers in this direction. Cheers, Nils
Attachment
pgsql-hackers by date: