Thread: Potential memory usage issue
Hi, I recently migrated one of our large (multi-hundred GB) dbs from an Intel 32bit platform (Dell 1650 - running 8.1.3) to a 64bit platform (Dell 1950 - running 8.1.5). However I am not seeing the performance gains I would expect - I am suspecting that some of this is due to differences I am seeing in reported memory usage. On the 1650 - a 'typical' postmaster process looks like this in top: 5267 postgres 16 0 439m 427m 386m S 3.0 21.1 3:31.73 postmaster On the 1940 - a 'typical' postmaster process looks like: 10304 postgres 16 0 41896 13m 11m D 4 0.3 0:11.73 postmaster I currently have both systems running in parallel so the workloads will be approximately equal. The configurations of the two systems in terms of postgresql.conf is pretty much identical between the two systems, I did make some changes to logging, but nothing to buffers/shared memory config. I have never seen a postmaster process on the new system consume anywhere near as much RAM as the old system - I am wondering if there is something up with the shared memory config/usage that is causing my performance issues. Any thoughts as to where I should go from here? Thanks, David. -- David Brain - bandwidth.com dbrain@bandwidth.com
In response to David Brain <dbrain@bandwidth.com>: > > I recently migrated one of our large (multi-hundred GB) dbs from an > Intel 32bit platform (Dell 1650 - running 8.1.3) to a 64bit platform > (Dell 1950 - running 8.1.5). However I am not seeing the performance > gains I would expect What were you expecting? It's possible that your expectations are unreasonable. In our testing, we found that 64bit on the same hardware as 32bit only gave us a 5% gain, in the best case. In many cases the gain was near 0, and in some there was a small performance loss. These findings seemed to jive with what others have been reporting. > - I am suspecting that some of this is due to > differences I am seeing in reported memory usage. > > On the 1650 - a 'typical' postmaster process looks like this in top: > > 5267 postgres 16 0 439m 427m 386m S 3.0 21.1 3:31.73 postmaster > > On the 1940 - a 'typical' postmaster process looks like: > > 10304 postgres 16 0 41896 13m 11m D 4 0.3 0:11.73 postmaster > > I currently have both systems running in parallel so the workloads will > be approximately equal. The configurations of the two systems in terms > of postgresql.conf is pretty much identical between the two systems, I > did make some changes to logging, but nothing to buffers/shared memory > config. > > I have never seen a postmaster process on the new system consume > anywhere near as much RAM as the old system - I am wondering if there is > something up with the shared memory config/usage that is causing my > performance issues. Any thoughts as to where I should go from here? Provide more information, for one thing. I'm assuming from the top output that this is some version of Linux, but more details on that are liable to elicit more helpful feedback. We run everything on FreeBSD here, but I haven't seen any difference in the way PostgreSQL uses memory on ia32 FreeBSD vs. amd64 FreeBSD. Without more details on your setup, my only suggestion would be to double-verify that your postgresql.conf settings are correct on the 64 bit system. -- Bill Moran Collaborative Fusion Inc. wmoran@collaborativefusion.com Phone: 412-422-3463x4023 **************************************************************** IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ****************************************************************
Hi, Thanks for the response. Bill Moran wrote: > In response to David Brain <dbrain@bandwidth.com>: >> I recently migrated one of our large (multi-hundred GB) dbs from an >> Intel 32bit platform (Dell 1650 - running 8.1.3) to a 64bit platform >> (Dell 1950 - running 8.1.5). However I am not seeing the performance >> gains I would expect > > What were you expecting? It's possible that your expectations are > unreasonable. > Possibly - but there is a fair step up hardware performance wise from a 1650 (Dual 1.4 Ghz PIII with U160 SCSI) to a 1950 (Dual, Dual Core 2.3 Ghz Xeons with SAS) - so I wasn't necessarily expecting much from the 32->64 transition (except maybe the option to go > 4GB easily - although currently we only have 4GB in the box), but was from the hardware standpoint. I am curious as to why 'top' gives such different output on the two systems - the datasets are large and so I know I benefit from having high shared_buffers and effective_cache_size settings. > Provide more information, for one thing. I'm assuming from the top output > that this is some version of Linux, but more details on that are liable > to elicit more helpful feedback. > Yes the OS is Linux - on the 1650 version 2.6.14, on the 1950 version 2.6.18 Thanks, David. -- David Brain - bandwidth.com dbrain@bandwidth.com 919.297.1078
In response to David Brain <dbrain@bandwidth.com>: > > Thanks for the response. > Bill Moran wrote: > > In response to David Brain <dbrain@bandwidth.com>: > >> I recently migrated one of our large (multi-hundred GB) dbs from an > >> Intel 32bit platform (Dell 1650 - running 8.1.3) to a 64bit platform > >> (Dell 1950 - running 8.1.5). However I am not seeing the performance > >> gains I would expect > > > > What were you expecting? It's possible that your expectations are > > unreasonable. > > Possibly - but there is a fair step up hardware performance wise from a > 1650 (Dual 1.4 Ghz PIII with U160 SCSI) to a 1950 (Dual, Dual Core 2.3 > Ghz Xeons with SAS) - so I wasn't necessarily expecting much from the > 32->64 transition (except maybe the option to go > 4GB easily - although > currently we only have 4GB in the box), but was from the hardware > standpoint. Ahh ... I didn't get that from your original message. > I am curious as to why 'top' gives such different output on the two > systems - the datasets are large and so I know I benefit from having > high shared_buffers and effective_cache_size settings. Have you done any actual queries on the new system? PG won't use the shm until it needs it -- and that doesn't occur until it gets a request for data via a query. Install the pg_bufferstats contrib module and take a look at how shared memory is being use. I like to use MRTG to graph shared buffer usage over time, but you can just do a SELECT count(*) WHERE NOT NULL to see how many buffers are actually in use. > > Provide more information, for one thing. I'm assuming from the top output > > that this is some version of Linux, but more details on that are liable > > to elicit more helpful feedback. > > > Yes the OS is Linux - on the 1650 version 2.6.14, on the 1950 version 2.6.18 -- Bill Moran Collaborative Fusion Inc. wmoran@collaborativefusion.com Phone: 412-422-3463x4023 **************************************************************** IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ****************************************************************
Bill Moran <wmoran@collaborativefusion.com> writes: > In response to David Brain <dbrain@bandwidth.com>: >> I am curious as to why 'top' gives such different output on the two >> systems - the datasets are large and so I know I benefit from having >> high shared_buffers and effective_cache_size settings. > Have you done any actual queries on the new system? PG won't use the > shm until it needs it -- and that doesn't occur until it gets a request > for data via a query. More accurately, top won't consider shared mem to be part of the process address space until it's actually touched by that process. regards, tom lane
Bill Moran wrote: > > Install the pg_bufferstats contrib module and take a look at how shared > memory is being use. I like to use MRTG to graph shared buffer usage > over time, but you can just do a SELECT count(*) WHERE NOT NULL to see > how many buffers are actually in use. > Can you explain what you'd use as a diagnostic on this - I just installed the module - but I'm not entirely clear as to what the output is actually showing me and/or what would be considered good or bad. Thanks, David. -- David Brain - bandwidth.com dbrain@bandwidth.com
In response to David Brain <dbrain@bandwidth.com>: > Bill Moran wrote: > > > > > Install the pg_bufferstats contrib module and take a look at how shared > > memory is being use. I like to use MRTG to graph shared buffer usage > > over time, but you can just do a SELECT count(*) WHERE NOT NULL to see > > how many buffers are actually in use. > > > > Can you explain what you'd use as a diagnostic on this - I just > installed the module - but I'm not entirely clear as to what the output > is actually showing me and/or what would be considered good or bad. Well, there are different things you can do with it. See the README, which I found pretty comprehensive. What I was referring to was the ability to track how many shared_buffers were actually in use, which can easily be seen at a cluster-wide view with two queries: select count(*) from pg_buffercache; select count(*) from pg_buffercache where reldatabase is not null; The first gives you the total number of buffers available (you could get this from your postgresql.conf as well, but with automated collection and graphing via mrtg, doing it this way guarantees that we'll always know what the _real_ value is) The second gives you the number of buffers that are actually holding data. If #2 is smaller than #1, that indicates that the entire working set of your database is able to fit in shared memory. This might not be your entire database, as some tables might never be queried from (i.e. log tables that are only queried when stuff goes wrong ...) This means that Postgres is usually able to execute queries without going to the disk for data, which usually equates to fast queries. If it's consistently _much_ lower, it may indicate that your shared_buffers value is too high, and the system may benefit from re-balancing memory usage. If #2 is equal to #1, it probably means that your working set is larger than the available shared buffers, this _may_ mean that your queries are using the disk a lot, and that you _may_ benefit from increasing shared_buffers, adding more RAM, sacrificing a 15000 RPM SCSI drive to the gods of performance, etc ... Another great thing to track is read activity. I do this via the pg_stat_database table: select sum(blks_hit) from pg_stat_database; select sum(blks_read) from pg_stat_database; (Note that you need block-level stats collecting enabled to make these usable) If the second one is increasing particularly fast, that's a strong indication that more shared_memory might improve performance. If neither of them are increasing, that indicates that nobody's really doing much with the database ;) I strongly recommend that you graph these values using mrtg or cacti or one of the many other programs designed to do that. It makes life nice when someone says, "hey, the DB system was really slow yesterday while you where busy in meetings, can you speed it up." -- Bill Moran Collaborative Fusion Inc.
Thanks Bill for the explanation - that really helped me out considerably. What this showed me was that there were only 1024 buffers configured. I'm not quite clear as to how this happened as the postgresql.conf files on both systems have the shared_buffers set to ~50000. However it looks as though the system start script was passing in -B 1024 to postmaster which was overriding the postgresql.conf settings. The really odd thing is that that the db start script is also the same on both systems, so there some other difference there that I need to track down. However removing the -B 1024 allowed the settings to revert to the file specified values. So now I'm back to using ~50k buffers again and things are running a little more swiftly, and according to pg_buffercache I'm using 49151 of them (-: Thanks again to those who helped me track this down. David. -- David Brain - bandwidth.com dbrain@bandwidth.com