Thread: Identifying the nature of blocking I/O
[for the purpose of this post, 'blocking' refers to an I/O operation taking a long time for reasons other than the amount of work the I/O operation itself actually implies; not to use of blocking I/O calls or anything like that] Hello, I have a situation in which deterministic latency is a lot more important than throughput. I realize this is a hugely complex topic and that there is inteaction between many different things (pg buffer cache, os buffer cache, raid controller caching, wal buffers, storage layout, etc). I already know several things I definitely want to do to improve things. But in general, it would be very interesting to see, at any given moment, what PostgreSQL backends are actually blocking on from the perspective of PostgreSQL. So for example, if I have 30 COMMIT:s that are active, to know whether it is simply waiting on the WAL fsync or actually waiting on a data fsync because a checkpoint is being created. or similarly, for non-commits whether they are blocking because WAL buffers is full and writing them out is blocking, etc. This would make it easier to observe and draw conclusions when tweaking different things in pg/the os/the raid controller. Is there currently a way of dumping such information? I.e., asking PG "what are backends waiting on right now?". -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org
Attachment
Peter Schuller wrote: > But in general, it would be very interesting to see, at any given > moment, what PostgreSQL backends are actually blocking on from the > perspective of PostgreSQL. The recent work on DTrace support for PostgreSQL will probably give you the easiest path to useful results. You'll probably need an OpenSolaris or (I think) FreeBSD host, though, rather than a Linux host. -- Craig Ringer
More info/notes on DTrace --
DTrace is available now on MacOSX, Solaris 10, OpenSolaris, and FreeBSD.
Linux however is still in the dark ages when it comes to system monitoring, especially with I/O.
You can write some custom DTrace scripts to map any of the basic Postgres operations or processes to things that it is waiting on in the OS. You can definitely write a script that would be able to track the I/O in reads and writes caused by a transaction, how long those took, what the I/O sizes were, and even what portion of the disk it went to.
http://lethargy.org/~jesus/archives/74-PostgreSQL-performance-through-the-eyes-of-DTrace.html
http://www.brendangregg.com/dtrace.html#DTraceToolkit
Even without the custom DTrace probes in Postgres, DTrace gives you the ability to see what the OS is doing, how long it is taking, and what processes, files, locks, or other things are involved. Most important however is the ability to correlate things and not just deal with high level aggregates like more simplistic tools. It takes some work and it is not the easiest thing to use, as its power comes at a complexity cost.
DTrace is available now on MacOSX, Solaris 10, OpenSolaris, and FreeBSD.
Linux however is still in the dark ages when it comes to system monitoring, especially with I/O.
You can write some custom DTrace scripts to map any of the basic Postgres operations or processes to things that it is waiting on in the OS. You can definitely write a script that would be able to track the I/O in reads and writes caused by a transaction, how long those took, what the I/O sizes were, and even what portion of the disk it went to.
http://lethargy.org/~jesus/archives/74-PostgreSQL-performance-through-the-eyes-of-DTrace.html
http://www.brendangregg.com/dtrace.html#DTraceToolkit
Even without the custom DTrace probes in Postgres, DTrace gives you the ability to see what the OS is doing, how long it is taking, and what processes, files, locks, or other things are involved. Most important however is the ability to correlate things and not just deal with high level aggregates like more simplistic tools. It takes some work and it is not the easiest thing to use, as its power comes at a complexity cost.
On Sun, Aug 24, 2008 at 5:30 PM, Craig Ringer <craig@postnewspapers.com.au> wrote:
Peter Schuller wrote:The recent work on DTrace support for PostgreSQL will probably give you
> But in general, it would be very interesting to see, at any given
> moment, what PostgreSQL backends are actually blocking on from the
> perspective of PostgreSQL.
the easiest path to useful results. You'll probably need an OpenSolaris
or (I think) FreeBSD host, though, rather than a Linux host.
--
Craig Ringer
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Craig Ringer <craig@postnewspapers.com.au> writes: > Peter Schuller wrote: >> But in general, it would be very interesting to see, at any given >> moment, what PostgreSQL backends are actually blocking on from the >> perspective of PostgreSQL. > The recent work on DTrace support for PostgreSQL will probably give you > the easiest path to useful results. You'll probably need an OpenSolaris > or (I think) FreeBSD host, though, rather than a Linux host. <cant-resist>get a mac</cant-resist> (Mind you, I don't think Apple sells any hardware that would be really suitable for a big-ass database server. But for development purposes, OS X on a recent laptop is a pretty nice unix-at-the-core-plus-eye-candy environment.) regards, tom lane
"Scott Carey" <scott@richrelevance.com> writes: > DTrace is available now on MacOSX, Solaris 10, OpenSolaris, and FreeBSD. > Linux however is still in the dark ages when it comes to system monitoring, > especially with I/O. Oh, after poking around a bit, I should note that some of my Red Hat compatriots think that "systemtap" is the long-term Linux answer here. I know zip about it myself, but it's something to read up on if you are looking for better performance monitoring on Linux. regards, tom lane
Tom Lane wrote: > "Scott Carey" <scott@richrelevance.com> writes: > > DTrace is available now on MacOSX, Solaris 10, OpenSolaris, and FreeBSD. > > Linux however is still in the dark ages when it comes to system monitoring, > > especially with I/O. > > Oh, after poking around a bit, I should note that some of my Red Hat > compatriots think that "systemtap" is the long-term Linux answer here. > I know zip about it myself, but it's something to read up on if you are > looking for better performance monitoring on Linux. FWIW there are a number of tracing options on Linux, none of which is said to be yet at the level of DTrace. See here for an article on the topic: http://lwn.net/Articles/291091/ -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Mon, Aug 25, 2008 at 3:34 AM, Scott Carey <scott@richrelevance.com> wrote: > DTrace is available now on MacOSX, Solaris 10, OpenSolaris, and FreeBSD. > Linux however is still in the dark ages when it comes to system monitoring, > especially with I/O. While that's true, newer 2.6 kernel versions at least have I/O accounting built in, something which only used to be available through the "atop" accounting kernel patch: $ cat /proc/22785/io rchar: 31928 wchar: 138 syscr: 272 syscw: 4 read_bytes: 0 write_bytes: 0 cancelled_write_bytes: 0 Alexander.
This matches not exactly the topic but it is sometimes helpfull. If you've enabled I/O accounting and a kernel >= 2.6.20 (needs to be compiled with **CONFIG_TASKSTATS=y CONFIG_TASK_DELAY_ACCT=y CONFIG_TASK_XACCT=y CONFIG_TASK_IO_ACCOUNTING=y ) and sysstat package (>= 7.1.5) installed you can use "pidstat" command which show's you the processes doing I/O in kb/sec. Robert ** Alexander Staubo wrote: > On Mon, Aug 25, 2008 at 3:34 AM, Scott Carey <scott@richrelevance.com> wrote: > >> DTrace is available now on MacOSX, Solaris 10, OpenSolaris, and FreeBSD. >> Linux however is still in the dark ages when it comes to system monitoring, >> especially with I/O. >> > > While that's true, newer 2.6 kernel versions at least have I/O > accounting built in, something which only used to be available through > the "atop" accounting kernel patch: > > $ cat /proc/22785/io > rchar: 31928 > wchar: 138 > syscr: 272 > syscw: 4 > read_bytes: 0 > write_bytes: 0 > cancelled_write_bytes: 0 > > Alexander. > >
On Fri, Aug 22, 2008 at 7:52 AM, Peter Schuller <peter.schuller@infidyne.com> wrote: > Is there currently a way of dumping such information? I.e., asking PG > "what are backends waiting on right now?". Unfortunately, not within Postgres itself. The question, "what is the database waiting on?" is a good one, and one Oracle understood in the early 90's. It is for that reason that EnterpriseDB added RITA, the Runtime Instrumentation and Tracing Architecture, to their Advanced Server product. RITA gives DBAs some of the same information as the Oracle Wait Interface does regarding what the database is waiting for, such as locks, I/O, and which relation/block. While it's not as efficient as DTrace due to Linux's lack of a good high-resolution user-mode timer, no one has found it to have a noticible overhead on the throughput of a system in benchmarks or real-world applications. If you're on a DTrace platform, I would suggest using it. Otherwise, you can try and use strace/ltrace on Linux, but that's probably not going to get you the answers you need quickly or easily enough. Until enough users ask for this type of feature, the community isn't going to see it as valuable enough to add to the core engine. IIRC, systemtap is pretty much dead :( -- Jonah H. Harris, Senior DBA myYearbook.com
On Sun, 24 Aug 2008, Tom Lane wrote: > Mind you, I don't think Apple sells any hardware that would be really > suitable for a big-ass database server. If you have money to burn, you can get an XServe with up to 8 cores and 32GB of RAM, and get a card to connect it to a Fiber Channel disk array. For only moderately large requirements, you can even get a card with 256MB of battery-backed cache (rebranded LSI) to attach the 3 drives in the chassis. None of these are very cost effective compared to servers like the popular HP models people mention here regularly, but it is possible. As for Systemtap on Linux, it might be possible that will accumulate enough of a standard library to be usable by regular admins one day, but I don't see any sign that's a priority for development. Right now what you have to know in order to write useful scripts is so much more complicated than DTrace, where there's all sorts of useful things you can script trivially. I think a good part of DTrace's success comes from flattening that learning curve. Take a look at the one-liners at http://www.solarisinternals.com/wiki/index.php/DTraceToolkit and compare them against http://sourceware.org/systemtap/examples/ That complexity works against the tool on so many levels. For example, I can easily imagine selling even a paranoid admin on running a simple DTrace script like the one-line examples. Whereas every Systemtap example I've seen looks pretty scary at first, and I can't imagine a DBA in a typical enterprise environment being able to convince their associated admin team they're perfectly safe to run in production. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD