Re: Wait free LW_SHARED acquisition - v0.9 - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Wait free LW_SHARED acquisition - v0.9 |
Date | |
Msg-id | 20141011005901.GF6724@awork2.anarazel.de Whole thread Raw |
In response to | Re: Wait free LW_SHARED acquisition - v0.9 (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Wait free LW_SHARED acquisition - v0.9
|
List | pgsql-hackers |
On 2014-10-11 06:18:11 +0530, Amit Kapila wrote: > On Fri, Oct 10, 2014 at 8:11 PM, Andres Freund <andres@2ndquadrant.com> > wrote: > > On 2014-10-10 17:18:46 +0530, Amit Kapila wrote: > > > On Fri, Oct 10, 2014 at 1:27 PM, Andres Freund <andres@2ndquadrant.com> > > > wrote: > > > > > Observations > > > > > ---------------------- > > > > > a. The patch performs really well (increase upto ~40%) incase all > the > > > > > data fits in shared buffers (scale factor -100). > > > > > b. Incase data doesn't fit in shared buffers, but fits in RAM > > > > > (scale factor -3000), there is performance increase upto 16 client > > > count, > > > > > however after that it starts dipping (in above config unto ~4.4%). > > > > > > > > Hm. Interesting. I don't see that dip on x86. > > > > > > Is it possible that implementation of some atomic operation is costlier > > > for particular architecture? > > > > Yes, sure. And IIRC POWER improved atomics performance considerably for > > POWER8... > > > > > I have tried again for scale factor 3000 and could see the dip and this > > > time I have even tried with 175 client count and the dip is > approximately > > > 5% which is slightly more than 160 client count. I've run some short tests on hydra: scale 1000: base: 4GB: tps = 296273.004800 (including connections establishing) tps = 296373.978100 (excluding connections establishing) 8GB: tps = 338001.455970 (including connections establishing) tps = 338177.439106 (excluding connections establishing) base + freelist: 4GB: tps = 297057.523528 (including connections establishing) tps = 297156.987418 (excluding connections establishing) 8GB: tps = 335123.867097 (including connections establishing) tps = 335239.122472 (excluding connections establishing) base + LW_SHARED: 4GB: tps = 296262.164455 (including connections establishing) tps = 296357.524819 (excluding connections establishing) 8GB: tps = 336988.744742 (including connections establishing) tps = 337097.836395 (excluding connections establishing) base + LW_SHARED + freelist: 4GB: tps = 296887.981743 (including connections establishing) tps = 296980.231853 (excluding connections establishing) 8GB: tps = 345049.062898 (including connections establishing) tps = 345161.947055 (excluding connections establishing) I've also run some preliminary tests using scale=3000 - and I couldn't see a performance difference either. Note that all these are noticeably faster than your results. > > > > > > Lwlock_contention patches - client_count=128 > > > ---------------------------------------------------------------------- > > > > > > + 7.95% postgres postgres [.] GetSnapshotData > > > + 3.58% postgres postgres [.] AllocSetAlloc > > > + 2.51% postgres postgres [.] _bt_compare > > > + 2.44% postgres postgres [.] > > > hash_search_with_hash_value > > > + 2.33% postgres [kernel.kallsyms] [k] .__copy_tofrom_user > > > + 2.24% postgres postgres [.] AllocSetFreeIndex > > > + 1.75% postgres postgres [.] > > > pg_atomic_fetch_add_u32_impl > > > > Uh. Huh? Normally that'll be inline. That's compiled with gcc? What were > > the compiler settings you used? > > Nothing specific, for performance tests where I have to take profiles > I use below: > ./configure --prefix=<installation_path> CFLAGS="-fno-omit-frame-pointer" > make Hah. Doing so overwrites the CFLAGS configure normally sets. Check # CFLAGS are selected so: # If the user specifies something in the environment, that is used. # else: If the template file set something, that is used. # else: If coverage was enabled, don't set anything. # else: If the compiler is GCC, then we use -O2. # else: If the compiler is something else, then we use -O, unless debugging. so, if you do like above, you're compiling without optimizations... So, include at least -O2 as well. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: