Thread: concurrent IO in postgres?
When testing the IO performance of ioSAN storage device from FusionIO (650GB MLC version) one of the things I tried is a set of IO intensive operations in Postgres: bulk data loads, updates, and queries calling for random IO. So far I cannot make Postgres take advantage of this tremendous IO capacity. I can squeeze a factor of a few here and there when caching cannot be utilized, but this hardware can do a lot more. Low level testing with fio shows on average x10 speedups over disk for sequential IO and x500-800 for random IO. With enough threads I can get IOPS in the 100-200K range and 1-1.5GB/s bandwidth, basically what's advertised. But not with Postgres. Is this because the Postgres backend is essentially single threaded and in general does not perform asynchronous IO, or I'm missing something? I found out that the effective_io_concurrency parameter only takes effect for bitmap index scans. Also, is there any work going on to allow concurrent IO in the backend and adapt Postgres to the capabilities of Flash? Will appreciate any comments, experiences, etc. Przemek Wozniak
On Thu, Dec 23, 2010 at 10:37 AM, Przemek Wozniak <wozniak@lanl.gov> wrote: > When testing the IO performance of ioSAN storage device from FusionIO > (650GB MLC version) one of the things I tried is a set of IO intensive > operations in Postgres: bulk data loads, updates, and queries calling > for random IO. So far I cannot make Postgres take advantage of this So, were you running a lot of these at once? Or just single threaded? I get very good io concurrency with lots of parallel postgresql connections on a 34 disk SAS setup with a battery backed controller.
Typically my problem is that the large queries are simply CPU bound.. do you have a sar/top output that you see. I'm currentlysetting up two FusionIO DUO @640GB in a lvm stripe to do some testing with, I will publish the results after I'mdone. If anyone has some tests/suggestions they would like to see done please let me know. - John -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Przemek Wozniak Sent: Thursday, December 23, 2010 11:38 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] concurrent IO in postgres? When testing the IO performance of ioSAN storage device from FusionIO (650GB MLC version) one of the things I tried is a set of IO intensive operations in Postgres: bulk data loads, updates, and queries calling for random IO. So far I cannot make Postgres take advantage of this tremendous IO capacity. I can squeeze a factor of a few here and there when caching cannot be utilized, but this hardware can do a lot more. Low level testing with fio shows on average x10 speedups over disk for sequential IO and x500-800 for random IO. With enough threads I can get IOPS in the 100-200K range and 1-1.5GB/s bandwidth, basically what's advertised. But not with Postgres. Is this because the Postgres backend is essentially single threaded and in general does not perform asynchronous IO, or I'm missing something? I found out that the effective_io_concurrency parameter only takes effect for bitmap index scans. Also, is there any work going on to allow concurrent IO in the backend and adapt Postgres to the capabilities of Flash? Will appreciate any comments, experiences, etc. Przemek Wozniak -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance This communication is for informational purposes only. It is not intended as an offer or solicitation for the purchase or sale of any financial instrument or as an official confirmation of any transaction. All market prices, data and other information are not warranted as to completeness or accuracy and are subject to change without notice. Any comments or statements made herein do not necessarily reflect those of JPMorgan Chase & Co., its subsidiaries and affiliates. This transmission may contain information that is privileged, confidential, legally privileged, and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is STRICTLY PROHIBITED. Although this transmission and any attachments are believed to be free of any virus or other defect that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is virus free and no responsibility is accepted by JPMorgan Chase & Co., its subsidiaries and affiliates, as applicable, for any loss or damage arising in any way from its use. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format. Thank you. Please refer to http://www.jpmorgan.com/pages/disclosures for disclosures relating to European legal entities.
John W Strange <john.w.strange@jpmchase.com> wrote: > Typically my problem is that the large queries are simply CPU > bound. Well, if your bottleneck is CPU, then you're obviously not going to be driving another resource (like disk) to its limit. First, though, I want to confirm that your "CPU bound" case isn't in the "I/O Wait" category of CPU time. What does `vmstat 1` show while you're CPU bound? If it's not I/O Wait time, then you need to try to look at the queries involved. If you're not hitting the disk because most of the active data is cached, that would normally be a good thing. What kind of throughput are you seeing? Do you need better? http://wiki.postgresql.org/wiki/SlowQueryQuestions -Kevin
On Thu, 2010-12-23 at 11:24 -0700, Scott Marlowe wrote: > On Thu, Dec 23, 2010 at 10:37 AM, Przemek Wozniak <wozniak@lanl.gov> wrote: > > When testing the IO performance of ioSAN storage device from FusionIO > > (650GB MLC version) one of the things I tried is a set of IO intensive > > operations in Postgres: bulk data loads, updates, and queries calling > > for random IO. So far I cannot make Postgres take advantage of this > > So, were you running a lot of these at once? Or just single threaded? > I get very good io concurrency with lots of parallel postgresql > connections on a 34 disk SAS setup with a battery backed controller. In one test I was running between 1 and 32 clients simultaneously writing lots of data using copy binary. The problem is that with a large RAM buffer it all goes there, and then the background writer, a single postgres process, will issue write requests one at a time I suspect. So the actual IO is effectively serialized by the backend.
--- On Thu, 12/23/10, John W Strange <john.w.strange@jpmchase.com> wrote: > Typically my problem is that the > large queries are simply CPU bound.. do you have a > sar/top output that you see. I'm currently setting up two > FusionIO DUO @640GB in a lvm stripe to do some testing with, > I will publish the results after I'm done. > > If anyone has some tests/suggestions they would like to see > done please let me know. > > - John Somewhat tangential to the current topics, I've heard that FusionIO uses internal cache and hence is not crash-safe, andif the cache is turned off performance will take a big hit. Is that your experience?
On Dec 23, 2010, at 11:58 AM, Andy wrote: > > Somewhat tangential to the current topics, I've heard that FusionIO uses internal cache and hence is not crash-safe, andif the cache is turned off performance will take a big hit. Is that your experience? It does use an internal cache, but it also has onboard battery power. The driver needs to put its house in order when restartingafter an unclean shutdown, however, and that can take up to 30 minutes per card.
On Dec 23, 2010, at 13:22:32, Ben Chobot wrote: > > On Dec 23, 2010, at 11:58 AM, Andy wrote: > > > > Somewhat tangential to the current topics, I've heard that FusionIO >uses > > internal cache and hence is not crash-safe, and if the cache is turned > > off performance will take a big hit. Is that your experience? > > It does use an internal cache, but it also has onboard battery power. The > driver needs to put its house in order when restarting after an unclean > shutdown, however, and that can take up to 30 minutes per card. Sorry to intrude here, but I'd like to clarify the behavior of the Fusion-io devices. Unlike SSDs, we do not use an internal cache nor do we use batteries. (We *do* have a small internal FIFO (with capacitive hold-up) that is 100% guaranteed to be written to our persistent storage in the event of unexpected power failure.) When a write() to a Fusion-io device has been acknowledged, the data is guaranteed to be stored safely. This is a strict requirement for any enterprise-ready storage device. Thanks, John Cagle Fusion-io, Inc. Confidentiality Notice: This e-mail message, its contents and any attachments to it are confidential to the intended recipient,and may contain information that is privileged and/or exempt from disclosure under applicable law. If you are notthe intended recipient, please immediately notify the sender and destroy the original e-mail message and any attachments(and any copies that may have been made) from your system or otherwise. Any unauthorized use, copying, disclosureor distribution of this information is strictly prohibited.
John, > When a write() to a Fusion-io device has been acknowledged, the data is > guaranteed to be stored safely. This is a strict requirement for any > enterprise-ready storage device. Thanks for the clarification! While you're here, any general advice on configuring fusionIO devices for database access, or vice-versa? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
I wonder how the OP configured effective_io_concurrency ; even on a single drive with command queuing the fadvise() calls that result do make a difference...
On Thu, Dec 23, 2010 at 11:46 AM, Przemek Wozniak <wozniak@lanl.gov> wrote: > In one test I was running between 1 and 32 clients simultaneously > writing lots of data using copy binary. Are you by-passing WAL? If not, you are likely serializing on that. Not so much the writing, but the lock. > The problem is that with a large > RAM buffer it all goes there, and then the background writer, a single > postgres process, will issue write requests one at a time I suspect. But those "writes" are probably just copies of 8K into kernel's RAM, and so very fast. > So the actual IO is effectively serialized by the backend. If the background writer cannot keep up, then the individual backends start doing writes as well, so it isn't really serialized.. Cheers, Jeff
Jeff Janes wrote: > If the background writer cannot keep up, then the individual backends > start doing writes as well, so it isn't really serialized.. > > Is there any parameter governing that behavior? Can you tell me where in the code (version 9.0.2) can I find that? Thanks. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com
On 12/25/10, Mladen Gogala <mladen.gogala@vmsinfo.com> wrote: > Jeff Janes wrote: >> If the background writer cannot keep up, then the individual backends >> start doing writes as well, so it isn't really serialized.. >> >> > Is there any parameter governing that behavior? No, it is automatic. There are parameters governing how likely it is that bgwriter falls behind in the first place, though. http://www.postgresql.org/docs/9.0/static/runtime-config-resource.html In particular bgwriter_lru_maxpages could be made bigger and/or bgwriter_delay smaller. But bulk copy binary might use a nondefault allocation strategy, and I don't know enough about that part of the code to assess the interaction of that with bgwriter. > Can you tell me where in > the code (version 9.0.2) can I find that? Thanks. Bufmgr.c, specifically BufferAlloc. Cheers, Jeff
Jeff Janes wrote: > There are parameters governing how likely it is that bgwriter falls > behind in the first place, though. > > http://www.postgresql.org/docs/9.0/static/runtime-config-resource.html > > In particular bgwriter_lru_maxpages could be made bigger and/or > bgwriter_delay smaller. > Also, one of the structures used for caching the list of fsync requests the background writer is handling, the thing that results in backend writes when it can't keep up, is proportional to the size of shared_buffers on the server. Setting that tunable to a reasonable size and lowering bgwriter_delay are two things that help most for the background writer to keep up with overall load rather than having backends write their own buffers. And the way checkpoints in PostgreSQL work, having more backend writes is generally not a performance improving change, even though it does have the property that it gets more processes writing at once. The thread opening post here really didn't discuss if any PostgreSQL server tuning or OS tuning was done to try and optimize performance. The usual list at http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server is normally a help. At the kernel level, the #1 thing I find necessary to get decent bulk performance in a lot of situations is proper read-ahead. On Linux for example, you must get the OS doing readahead to compensate for the fact that PostgreSQL is issuing requests in a serial sequence. It's going to ask for block #1, then block #2, then block #3, etc. If the OS doesn't start picking up on that pattern and reading blocks 4, 5, 6, etc. before the server asks for them, to keep the disk fully occupied and return the database data fast from the kernel buffers, you'll never reach the full potential even of a regular hard drive. And the default readahead on Linux is far too low for modern hardware. > But bulk copy binary might use a nondefault allocation strategy, and I > don't know enough about that part of the code to assess the > interaction of that with bgwriter. > It's documented pretty well in src/backend/storage/buffer/README , specifically the "Buffer Ring Replacement Strategy" section. Sequential scan reads, VACUUM, COPY IN, and CREATE TABLE AS SELECT are the operations that get one of the more specialized buffer replacement strategies. These all use the same basic approach, which is to re-use a ring of data rather than running rampant over the whole buffer cache. The main thing different between them is the size of the ring. Inside freelist.c the GetAccessStrategy code lets you see the size you get in each of these modes. Since PostgreSQL reads and writes through the OS buffer cache in addition to its own shared_buffers pool, this whole ring buffer thing doesn't protect the OS cache from being trashed by a big bulk operation. Your only real defense there is to make shared_buffers large enough that it retains a decent chunk of data even in the wake of that. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books