Thread: shared_buffers performance
Hi all, I started to do some performance tests (using pgbench) in order to estimate the DRBD impact on our servers, my plan was to perform some benchmarks without DRBD in order to compare the same benchmark with DRBD. I didn't perform yet the benchmark with DRBD and I'm already facing something I can not explain (I performed at the moment only reads test). I'm using postgres 8.2.3 on Red Hat compiled with GCC 3.4.6. I'm using pgbench with scaling factor with a range [1:500], my server has 4 cores so I'm trying with 16 client and 4000 transaction per client: pgbench -t 4000 -c 16 -S db_perf. I did 3 session using 3 different values of shared_buffers: 64MB, 256MB, 512MB and my server has 2GB. The following graph reports the results: http://img84.imageshack.us/my.php?image=totalid7.png as you can see using 64MB as value for shared_buffers I'm obtaining better results. Is this something expected or I'm looking in the wrong direction? I'm going to perform same tests without using the -S option in pgbench but being a time expensive operation I would like to ear your opinion first. Regards Gaetano Mendola
"Gaetano Mendola" <mendola@gmail.com> writes: > The following graph reports the results: > > http://img84.imageshack.us/my.php?image=totalid7.png That's a *fascinating* graph. It seems there are basically three domains. The small domain where the database fits in shared buffers -- though actually this domain seems to hold until the accounts table is about 1G so maybe it's more that the *indexes* fit in memory. Here larger shared buffers do clearly win. The transition domain where performance drops dramatically as the database starts to not fit in shared buffers but does still fit in filesystem cache. Here every megabyte stolen from the filesystem cache makes a *huge* difference. At a scale factor of 120 or so you're talking about a factor of 4 between each of the shared buffer sizes. The large domain where the database doesn't fit in filesystem cache. Here it doesn't make a large difference but the more buffers duplicated between postgres and the filesystem cache the lower the overall cache effectiveness. If we used something like either mmap or directio to avoid the double buffering we would be able to squeeze these into a single curve, as well as push the dropoff slightly to the right. In theory. In practice it would depend on the OS's ability to handle page faults efficiently in the mmap case, and our ability to do read-ahead and cache management in the directio case. And it would be a huge increase in complexity for Postgres and a push into a direction which isn't our "core competency". We might find that while in theory it should perform better our code just can't keep up with Linux's and it doesn't. I'm curious about the total database size as a for each of the scaling factors as well as the total of the index sizes. And how much memory Linux says is being used for filesystem buffers. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!
Gregory Stark wrote: > "Gaetano Mendola" <mendola@gmail.com> writes: > >> The following graph reports the results: >> >> http://img84.imageshack.us/my.php?image=totalid7.png > > That's a *fascinating* graph. It is, isn't it? Thanks Gaetano. > It seems there are basically three domains. > > The small domain where the database fits in shared buffers -- though actually > this domain seems to hold until the accounts table is about 1G so maybe it's > more that the *indexes* fit in memory. Here larger shared buffers do clearly > win. I think this is actually in two parts - you can see it clearly on the red trace (64MB), less so on the green (256MB) and not at all on the blue (512MB). Presumably the left-hand steeper straight-line decline starts with the working-set in shared-buffers, and the "knee" is where we're down to just indexes in shared-buffers. With the blue I guess you just get the first part, because by the time you're overflowing shared-buffers, you've not got enough disk-cache to take up the slack for you. I wonder what difference 8.3 makes to this? -- Richard Huxton Archonet Ltd
On Mon, 14 Apr 2008, Gaetano Mendola wrote: > I'm using postgres 8.2.3 on Red Hat compiled with GCC 3.4.6. 8.2.3 has a performance bug that impacts how accurate pgbench results are; you really should be using a later version. > http://img84.imageshack.us/my.php?image=totalid7.png > as you can see using 64MB as value for shared_buffers I'm obtaining > better results. I'm assuming you've read my scaling article at http://www.westnet.com/~gsmith/content/postgresql/pgbench-scaling.htm since you're using the graph template I suggest there. If you look carefully at your results, you are getting better results for higher shared_buffers values in the cases where performance is memory bound (the lower scale numbers). Things reverse so that more buffers gives worse performance only when your scale >100. I wouldn't conclude too much from that. The pgbench select test is doing a low-level operation that doesn't benefit as much from having more memory available to PostgreSQL instead of the OS as a real-world workload will. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Mon, 14 Apr 2008, Gregory Stark wrote: > I'm curious about the total database size as a for each of the scaling factors > as well as the total of the index sizes. That's all in a table at http://www.westnet.com/~gsmith/content/postgresql/pgbench-scaling.htm -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Gregory Stark <stark@enterprisedb.com> writes: > The transition domain where performance drops dramatically as the database > starts to not fit in shared buffers but does still fit in filesystem cache. It looks to me like the knee comes where the DB no longer fits in filesystem cache. What's interesting is that there seems to be no synergy at all between shared_buffers and the filesystem cache. Ideally, very hot pages would stay in shared buffers and drop out of the kernel cache, allowing you to use a database approximating all-of-RAM before you hit the performance wall. It's clear that in this example that's not happening, or at least that only a small part of shared buffers isn't getting duplicated in filesystem cache. Of course, that's because pgbench reads a randomly-chosen row of "accounts" in each transaction, so that there's exactly zero locality of access. A more realistic workload would probably have a Zipfian distribution of account number touches, and might look a little better on this type of test. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Gregory Stark <stark@enterprisedb.com> writes: >> The transition domain where performance drops dramatically as the database >> starts to not fit in shared buffers but does still fit in filesystem cache. > > It looks to me like the knee comes where the DB no longer fits in > filesystem cache. That does seem to make a lot more sense. I think I misread the units of the size of the accounts table. Reading it again it seems to be in the 1.5G-2G range for the transition which with indexes and other tables might be starting to stress the filesystem cache -- though it still seems a little low. I think if I squint I can see another dropoff at the very small scaling numbers. That must be the point where the database is comparable to the shared buffers size. Except then I would expect the green and blue curves to be pushed to the right a bit rather than just havin a shallower slope. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning
On Mon, 14 Apr 2008, Tom Lane wrote: > Ideally, very hot pages would stay in shared buffers and drop out of the > kernel cache, allowing you to use a database approximating all-of-RAM > before you hit the performance wall. With "pgbench -S", the main hot pages that get elevated usage counts and cling persistantly to shared buffers are those holding data from the primary key on the accounts table. Here's an example of what the buffer cache actually has after running "pgbench -S -c 8 -t 10000 pgbench" on a system with shared_buffers=256MB and a total of 2GB of RAM. Database scale is 100, so there's approximately 1.5GB worth of database, mainly a 1.3GB accounts table and 171MB of primary key on accounts: relname |buffered| buffers % | % of rel accounts | 306 MB | 65.3 | 24.7 accounts pkey | 160 MB | 34.1 | 93.2 relname | buffers | usage accounts | 10223 | 0 accounts | 25910 | 1 accounts | 2825 | 2 accounts | 214 | 3 accounts | 14 | 4 accounts pkey | 2173 | 0 accounts pkey | 5392 | 1 accounts pkey | 5086 | 2 accounts pkey | 3747 | 3 accounts pkey | 2296 | 4 accounts pkey | 1756 | 5 This example and the queries to produce that summary are all from the "Inside the PostgreSQL Buffer Cache" talk on my web page. For this simple workload, if you can fit the main primary key in shared buffers that helps, but making that too large takes away memory that could be more usefully given to the OS to manage. The fact that you can start to suffer from double-buffering (where the data is in the OS filesystem cache and shared_buffers) when making shared_buffers too large on a benchmark workload is interesting. But I'd suggest considering the real application, rather than drawing a conclusion about shared_buffers sizing based just on that phenomenon. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith wrote: > On Mon, 14 Apr 2008, Gaetano Mendola wrote: > >> I'm using postgres 8.2.3 on Red Hat compiled with GCC 3.4.6. > > 8.2.3 has a performance bug that impacts how accurate pgbench results > are; you really should be using a later version. Thank you, I will give it a shot and performe some tests to see if they change a lot, in case I will repeat the entire benchmarks. Regards Gaetano Mendola
Greg Smith wrote: > On Mon, 14 Apr 2008, Gaetano Mendola wrote: > >> I'm using postgres 8.2.3 on Red Hat compiled with GCC 3.4.6. > > 8.2.3 has a performance bug that impacts how accurate pgbench results > are; you really should be using a later version. > >> http://img84.imageshack.us/my.php?image=totalid7.png >> as you can see using 64MB as value for shared_buffers I'm obtaining >> better results. > > I'm assuming you've read my scaling article at > http://www.westnet.com/~gsmith/content/postgresql/pgbench-scaling.htm > since you're using the graph template I suggest there. > Yes I was basically inspired from that page, my true goal is not to study the effect of shared_buffers (this was a side effect) but to study the performance lose using DRBD on our server. I'm producing similar graph using pgperf without -S, I will post them as soon they are ready. Regards Gaetano Mendola
Gaetano Mendola wrote: > Hi all, > I started to do some performance tests (using pgbench) in order to > estimate the DRBD impact on our servers, my plan was to perform some > benchmarks without DRBD in order to compare the same benchmark with > DRBD. > I didn't perform yet the benchmark with DRBD and I'm already facing > something I can not explain (I performed at the moment only reads test). > > I'm using postgres 8.2.3 on Red Hat compiled with GCC 3.4.6. > > I'm using pgbench with scaling factor with a range [1:500], my server > has 4 cores so I'm trying with 16 client and 4000 transaction per > client: pgbench -t 4000 -c 16 -S db_perf. I did 3 session using 3 different > values of shared_buffers: 64MB, 256MB, 512MB and my server has 2GB. > > The following graph reports the results: > > http://img84.imageshack.us/my.php?image=totalid7.png > > as you can see using 64MB as value for shared_buffers I'm obtaining better > results. Is this something expected or I'm looking in the wrong direction? > I'm going to perform same tests without using the -S option in pgbench but > being a time expensive operation I would like to ear your opinion first. I have complete today the other benchmarks using pgbench in write mode as well, and the following graph resumes the results: http://img440.imageshack.us/my.php?image=totalwbn0.png what I can say here the trend is the opposite seen on the read only mode as increasing the shared_buffers increases the TPS. I still didn't upgrade to 8.2.7 as suggested by Greg Smith because I would like to compare the results obtained till now with the new one (simulations running while I write) using postgres on a "DRBD partition"; sure as soon the current tests terminate I will upgrade postgres. If you have any suggestions on what you would like to see/know, just let me know. Regards Gaetano Mendola
Hello, I need wo advice on vacuum settings. I have a quad core X5355 @ 2.66GHz with 8 Go of memory 1Q) Why autovaccum does not work, I have set the value to on in postgresql.conf but when the server start it's still off !!!! 2Q) Here are my settings for vacuum, could you help me to optimise those settings, at the moment the vacuum analyse sent every night is taking around 18 h to run, which slow down the server performance. # - Cost-Based Vacuum Delay - vacuum_cost_delay = 5 # 0-1000 milliseconds vacuum_cost_page_hit = 1000 # 0-10000 credits vacuum_cost_page_miss = 1000 # 0-10000 credits vacuum_cost_page_dirty = 120 # 0-10000 credits vacuum_cost_limit = 20 # 0-10000 credits # - Background writer - bgwriter_delay = 50 # 10-10000 milliseconds between rounds bgwriter_lru_percent = 1.0 # 0-100% of LRU buffers scanned/round bgwriter_lru_maxpages = 25 # 0-1000 buffers max written/round bgwriter_all_percent = 0.333 # 0-100% of all buffers scanned/round bgwriter_all_maxpages = 50 # 0-1000 buffers max written/round Thanks in advance for your helps Regards David
dforums wrote: > Hello, > > I need wo advice on vacuum settings. > > I have a quad core X5355 @ 2.66GHz with 8 Go of memory > > 1Q) Why autovaccum does not work, I have set the value to on in > postgresql.conf but when the server start it's still off !!!! You need to turn stats_row_level on too. > # - Cost-Based Vacuum Delay - > > vacuum_cost_delay = 5 # 0-1000 milliseconds > vacuum_cost_page_hit = 1000 # 0-10000 credits > vacuum_cost_page_miss = 1000 # 0-10000 credits > vacuum_cost_page_dirty = 120 # 0-10000 credits > vacuum_cost_limit = 20 # 0-10000 credits The cost are all too high and the limit too low. I suggest resetting to the default values, and figuring out a reasonable delay limit (your current 5ms value seems a bit too low, but I think in most cases 10ms is the practical limit due to sleep granularity in the kernel. In any case, since the other values are all wrong I suggest just setting it to 10ms and seeing what happens). -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
dforums <dforums 'at' vieonet.com> writes: > 2Q) Here are my settings for vacuum, could you help me to optimise > those settings, at the moment the vacuum analyse sent every night is > taking around 18 h to run, which slow down the server performance. It's a lot of time for a daily job (and it is interesting to vacuum hot tables more often than daily). With typical settings, it's probable that autovacuum will run forever (e.g. at the end of run, another run will already be needed). You should first verify you don't have bloat in your tables (a lot of dead rows) - bloat can be created by too infrequent vacuuming and too low FSM settings[1]. To fix the bloat, you can dump and restore your DB if you can afford interrupting your application, or use VACUUM FULL if you can afford blocking your application (disclaimer: many posters here passionately disgust VACUUM FULL and keep on suggesting the use of CLUSTER). Ref: [1] to say whether you have bloat, you can use contrib/pgstattuple (you can easily add it to a running PostgreSQL). If the free_percent reported for interesting tables is large, and free_space is large compared to 8K, then you have bloat; another way is to dump your database, restore it onto another database, issue VACUUM VERBOSE on a given table on both databases (in live, and on the restore) and compare the reported number of pages needed. The difference is the bloat. live=# VACUUM VERBOSE interesting_table; [...] INFO: "interesting_table": found 408 removable, 64994 nonremovable row versions in 4395 pages restored=# VACUUM VERBOSE interesting_table; [...] INFO: "interesting_table": found 0 removable, 64977 nonremovable row versions in 628 pages => (4395-628)*8/1024.0 MB of bloat (IIRC, this VACUUM output is for 7.4, it has changed a bit since then) -- Guillaume Cottenceau
On Apr 14, 2008, at 3:31 PM, Tom Lane wrote: > Gregory Stark <stark@enterprisedb.com> writes: >> The transition domain where performance drops dramatically as the >> database >> starts to not fit in shared buffers but does still fit in >> filesystem cache. > > It looks to me like the knee comes where the DB no longer fits in > filesystem cache. What's interesting is that there seems to be no > synergy at all between shared_buffers and the filesystem cache. > Ideally, very hot pages would stay in shared buffers and drop out > of the > kernel cache, allowing you to use a database approximating all-of-RAM > before you hit the performance wall. It's clear that in this example > that's not happening, or at least that only a small part of shared > buffers isn't getting duplicated in filesystem cache. I suspect that we're getting double-buffering on everything because every time we dirty a buffer and write it out the OS is considering that as access, and keeping that data in it's cache. It would be interesting to try an overcome that and see how it impacts things. With our improvement in checkpoint handling, we might be able to just write via DIO... if not maybe there's some way to tell the OS to buffer the write for us, but target that data for removal from cache as soon as it's written. > Of course, that's because pgbench reads a randomly-chosen row of > "accounts" in each transaction, so that there's exactly zero locality > of access. A more realistic workload would probably have a Zipfian > distribution of account number touches, and might look a little better > on this type of test. -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828