Thread: updating a row in a table with only one row
Hello everyone, I'm using PostgreSQL 8.3.8 running on a server with 2 Xeon CPUs, 4GB RAM, 4+2 disks in RAID 5 and CentOS 5.3. There's only one database which dumped with pgdump takes ~0.5GB. There are ~100 tables in the database and one of them (tableOne) always contains only a single row. There's one index on it. However performing update on the single row (which occurs every 60 secs) takes a considerably long time -- around 200ms. The system is not loaded in any way. The table definition is: CREATE TABLE tableOne ( value1 BIGINT NOT NULL, value2 INTEGER NOT NULL, value3 INTEGER NOT NULL, value4 INTEGER NOT NULL, value5 INTEGER NOT NULL, ); CREATE INDEX tableOne_index1 ON tableOne (value5); And the SQL query to update the _only_ row in the above table is: ('value5' can't be used to identify the row as I don't know it at the time) UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 = newValue5; And this is what EXPLAIN says on the above SQL query: DB=> EXPLAIN UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 = newValue5; LOG: duration: 235.948 ms statement: EXPLAIN UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 = newValue5; QUERY PLAN -------------------------------------------------------- Seq Scan on jackpot (cost=0.00..1.01 rows=1 width=14) (1 row) What takes PostgreSQL so long? I guess I could add a fake 'id' column, create an index on it to identify the single row, but still -- the time seems quite ridiculous to me. Thanks, -- Michal (fuf@mageo.cz)
In response to Michal Vitecek : > There are ~100 tables in the database and one of them (tableOne) always > contains only a single row. There's one index on it. However performing In this case, only one row, you don't need an index. Really. > update on the single row (which occurs every 60 secs) takes a > considerably long time -- around 200ms. The system is not loaded in any > way. > > UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 = newValue5; > > And this is what EXPLAIN says on the above SQL query: > > DB=> EXPLAIN UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 = newValue5; > LOG: duration: 235.948 ms statement: EXPLAIN UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 = newValue5; > QUERY PLAN > -------------------------------------------------------- > Seq Scan on jackpot (cost=0.00..1.01 rows=1 width=14) > (1 row) tableOne or jackpot? > > What takes PostgreSQL so long? I guess I could add a fake 'id' column, > create an index on it to identify the single row, but still -- the time > seems quite ridiculous to me. Maybe a lot of dead tuples, can you show us the output generated from explain analyse? I would suggest you to do a 'vacuum full' on this table. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header)
> There are ~100 tables in the database and one of them (tableOne) always > contains only a single row. There's one index on it. However performing > update on the single row (which occurs every 60 secs) takes a > considerably long time -- around 200ms. The system is not loaded in any > way. How often is the the table VACUUMed ? At the mentioned update rate, the table has a daily growth of 1440 dead row versions. Helder M. Vieira
On Fri, Oct 2, 2009 at 4:18 AM, Michal Vitecek <fuf@mageo.cz> wrote: > Hello everyone, > > I'm using PostgreSQL 8.3.8 running on a server with 2 Xeon CPUs, 4GB > RAM, 4+2 disks in RAID 5 and CentOS 5.3. There's only one database > which dumped with pgdump takes ~0.5GB. > > There are ~100 tables in the database and one of them (tableOne) always > contains only a single row. There's one index on it. However performing > update on the single row (which occurs every 60 secs) takes a > considerably long time -- around 200ms. The system is not loaded in any > way. > > The table definition is: > > CREATE TABLE tableOne ( > value1 BIGINT NOT NULL, > value2 INTEGER NOT NULL, > value3 INTEGER NOT NULL, > value4 INTEGER NOT NULL, > value5 INTEGER NOT NULL, > ); > CREATE INDEX tableOne_index1 ON tableOne (value5); > > And the SQL query to update the _only_ row in the above table is: > ('value5' can't be used to identify the row as I don't know it at the > time) > > UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 = newValue5; > > And this is what EXPLAIN says on the above SQL query: > > DB=> EXPLAIN UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 = newValue5; > LOG: duration: 235.948 ms statement: EXPLAIN UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 = newValue5; > QUERY PLAN > -------------------------------------------------------- > Seq Scan on jackpot (cost=0.00..1.01 rows=1 width=14) > (1 row) > > What takes PostgreSQL so long? I guess I could add a fake 'id' column, > create an index on it to identify the single row, but still -- the time > seems quite ridiculous to me. it is ridiculous. your problem is almost definitely dead rows. I can't recall (and I can't find the info anywhere) if the 'hot' feature requires an index to be active -- I think it does. If so, creating a dummy field and indexing it should resolve the problem. Can you confirm the dead row issue by doing vacuum verbose and create the index? please respond with your results, I'm curious. Also, is autovacuum on? Have you measured iowait? merlin
On Fri, Oct 2, 2009 at 9:54 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Fri, Oct 2, 2009 at 4:18 AM, Michal Vitecek <fuf@mageo.cz> wrote: >> Hello everyone, >> >> I'm using PostgreSQL 8.3.8 running on a server with 2 Xeon CPUs, 4GB >> RAM, 4+2 disks in RAID 5 and CentOS 5.3. There's only one database >> which dumped with pgdump takes ~0.5GB. >> >> There are ~100 tables in the database and one of them (tableOne) always >> contains only a single row. There's one index on it. However performing >> update on the single row (which occurs every 60 secs) takes a >> considerably long time -- around 200ms. The system is not loaded in any >> way. >> >> The table definition is: >> >> CREATE TABLE tableOne ( >> value1 BIGINT NOT NULL, >> value2 INTEGER NOT NULL, >> value3 INTEGER NOT NULL, >> value4 INTEGER NOT NULL, >> value5 INTEGER NOT NULL, >> ); >> CREATE INDEX tableOne_index1 ON tableOne (value5); >> >> And the SQL query to update the _only_ row in the above table is: >> ('value5' can't be used to identify the row as I don't know it at the >> time) >> >> UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 = newValue5; >> >> And this is what EXPLAIN says on the above SQL query: >> >> DB=> EXPLAIN UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 = newValue5; >> LOG: duration: 235.948 ms statement: EXPLAIN UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 = newValue5; >> QUERY PLAN >> -------------------------------------------------------- >> Seq Scan on jackpot (cost=0.00..1.01 rows=1 width=14) >> (1 row) >> >> What takes PostgreSQL so long? I guess I could add a fake 'id' column, >> create an index on it to identify the single row, but still -- the time >> seems quite ridiculous to me. > > it is ridiculous. your problem is almost definitely dead rows. I > can't recall (and I can't find the info anywhere) if the 'hot' feature > requires an index to be active -- I think it does. If so, creating a > dummy field and indexing it should resolve the problem. Can you > confirm the dead row issue by doing vacuum verbose and create the > index? please respond with your results, I'm curious. Also, is > autovacuum on? Have you measured iowait? Since he's updating all the fields in the table, an index will certainly ensure that HOT does not apply, no? ...Robert
On Fri, Oct 2, 2009 at 1:39 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Oct 2, 2009 at 9:54 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >> it is ridiculous. your problem is almost definitely dead rows. I >> can't recall (and I can't find the info anywhere) if the 'hot' feature >> requires an index to be active -- I think it does. If so, creating a >> dummy field and indexing it should resolve the problem. Can you >> confirm the dead row issue by doing vacuum verbose and create the >> index? please respond with your results, I'm curious. Also, is >> autovacuum on? Have you measured iowait? > > Since he's updating all the fields in the table, an index will > certainly ensure that HOT does not apply, no? you're right...I missed that he put an index on value5 (why?). That's what's killing him. merlin
Robert Haas wrote: > On Fri, Oct 2, 2009 at 9:54 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >> On Fri, Oct 2, 2009 at 4:18 AM, Michal Vitecek <fuf@mageo.cz> wrote: >>> Hello everyone, >>> >>> I'm using PostgreSQL 8.3.8 running on a server with 2 Xeon CPUs, 4GB >>> RAM, 4+2 disks in RAID 5 and CentOS 5.3. There's only one database >>> which dumped with pgdump takes ~0.5GB. >>> >>> There are ~100 tables in the database and one of them (tableOne) always >>> contains only a single row. There's one index on it. However performing >>> update on the single row (which occurs every 60 secs) takes a >>> considerably long time -- around 200ms. The system is not loaded in any >>> way. >>> >>> The table definition is: >>> >>> CREATE TABLE tableOne ( >>> value1 BIGINT NOT NULL, >>> value2 INTEGER NOT NULL, >>> value3 INTEGER NOT NULL, >>> value4 INTEGER NOT NULL, >>> value5 INTEGER NOT NULL, >>> ); >>> CREATE INDEX tableOne_index1 ON tableOne (value5); >>> >>> And the SQL query to update the _only_ row in the above table is: >>> ('value5' can't be used to identify the row as I don't know it at the >>> time) >>> >>> UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 = newValue5; >>> >>> And this is what EXPLAIN says on the above SQL query: >>> >>> DB=> EXPLAIN UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 = newValue5; >>> LOG: duration: 235.948 ms statement: EXPLAIN UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 =newValue5; >>> QUERY PLAN >>> -------------------------------------------------------- >>> Seq Scan on jackpot (cost=0.00..1.01 rows=1 width=14) >>> (1 row) >>> >>> What takes PostgreSQL so long? I guess I could add a fake 'id' column, >>> create an index on it to identify the single row, but still -- the time >>> seems quite ridiculous to me. >> it is ridiculous. your problem is almost definitely dead rows. I >> can't recall (and I can't find the info anywhere) if the 'hot' feature >> requires an index to be active -- I think it does. If so, creating a >> dummy field and indexing it should resolve the problem. Can you >> confirm the dead row issue by doing vacuum verbose and create the >> index? please respond with your results, I'm curious. Also, is >> autovacuum on? Have you measured iowait? > > Since he's updating all the fields in the table, an index will > certainly ensure that HOT does not apply, no? An extra index shouldn't hurt if you don't update the indexed dummy column. But the existing tableOne_index1 will cause HOT to not apply, if value5 is updated. I'd suggest dropping it (and not creating any other indexes either), it won't do any good on a table with only one row anyway. If the table is indeed bloated, VACUUM FULL should shrink it back. I wonder how it got to be that way, though. Autovacuum should keep a table like that in check. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Robert Haas wrote: >On Fri, Oct 2, 2009 at 9:54 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >> On Fri, Oct 2, 2009 at 4:18 AM, Michal Vitecek <fuf@mageo.cz> wrote: >>> Hello everyone, >>> >>> I'm using PostgreSQL 8.3.8 running on a server with 2 Xeon CPUs, 4GB >>> RAM, 4+2 disks in RAID 5 and CentOS 5.3. There's only one database >>> which dumped with pgdump takes ~0.5GB. >>> >>> There are ~100 tables in the database and one of them (tableOne) always >>> contains only a single row. There's one index on it. However performing >>> update on the single row (which occurs every 60 secs) takes a >>> considerably long time -- around 200ms. The system is not loaded in any >>> way. >>> >>> The table definition is: >>> >>> CREATE TABLE tableOne ( >>> value1 BIGINT NOT NULL, >>> value2 INTEGER NOT NULL, >>> value3 INTEGER NOT NULL, >>> value4 INTEGER NOT NULL, >>> value5 INTEGER NOT NULL, >>> ); >>> CREATE INDEX tableOne_index1 ON tableOne (value5); >>> >>> And the SQL query to update the _only_ row in the above table is: >>> ('value5' can't be used to identify the row as I don't know it at the >>> time) >>> >>> UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 = newValue5; >>> >>> And this is what EXPLAIN says on the above SQL query: >>> >>> DB=> EXPLAIN UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 = newValue5; >>> LOG: duration: 235.948 ms statement: EXPLAIN UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 =newValue5; >>> QUERY PLAN >>> -------------------------------------------------------- >>> Seq Scan on tableOne (cost=0.00..1.01 rows=1 width=14) >>> (1 row) >>> >>> What takes PostgreSQL so long? I guess I could add a fake 'id' column, >>> create an index on it to identify the single row, but still -- the time >>> seems quite ridiculous to me. >> >> it is ridiculous. your problem is almost definitely dead rows. I >> can't recall (and I can't find the info anywhere) if the 'hot' feature >> requires an index to be active -- I think it does. If so, creating a >> dummy field and indexing it should resolve the problem. Can you >> confirm the dead row issue by doing vacuum verbose and create the >> index? please respond with your results, I'm curious. Also, is >> autovacuum on? Have you measured iowait? Autovacuum is on. I have dropped the superfluous index on value5. The following is a result of running vacuum verbose analyze on the table after the database has been running for 3 days (it was restored from pgdump 3 days ago). DB=> vacuum verbose analyze tableOne; INFO: vacuuming "public.tableOne" INFO: "tableOne": found 82 removable, 1 nonremovable row versions in 1 pages DETAIL: 0 dead row versions cannot be removed yet. There were 141 unused item pointers. 1 pages contain useful free space. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: analyzing "public.tableOne" INFO: "tableOne": scanned 1 of 1 pages, containing 1 live rows and 0 dead rows; 1 rows in sample, 1 estimated total rows LOG: duration: 23.833 ms statement: vacuum verbose analyze tableOne; VACUUM The problem occurs also on different tables but on tableOne this is most striking as it is very simple. Also I should mention that the problem doesn't occur every time -- but in ~1/6 cases. Could the problem be the HW RAID card? There's ServerRAID 8k with 256MB with write-back enabled. Could it be that its internal cache becomes full and all disk I/O operations are delayed until it writes all changes to hard drives? Thanks, -- Michal Vitecek (fuf@mageo.cz)
On Mon, Oct 5, 2009 at 5:17 AM, Michal Vitecek <fuf@mageo.cz> wrote: > Could the problem be the HW RAID card? There's ServerRAID 8k with 256MB > with write-back enabled. Could it be that its internal cache becomes > full and all disk I/O operations are delayed until it writes all > changes to hard drives? that's possible...the red flag is going to be iowait. if your server can't keep up with the sync demands for example, you will eventually outrun the write cache and you can start to see slow queries. With your server though it would take in the hundreds of (write) transactions per second to do that minimum. merlin
Merlin Moncure wrote: >On Mon, Oct 5, 2009 at 5:17 AM, Michal Vitecek <fuf@mageo.cz> wrote: > >> Could the problem be the HW RAID card? There's ServerRAID 8k with 256MB >> with write-back enabled. Could it be that its internal cache becomes >> full and all disk I/O operations are delayed until it writes all >> changes to hard drives? > >that's possible...the red flag is going to be iowait. if your server >can't keep up with the sync demands for example, you will eventually >outrun the write cache and you can start to see slow queries. With >your server though it would take in the hundreds of (write) >transactions per second to do that minimum. The problem is that the server is not loaded in any way. The iowait is 0.62%, there's only 72 sectors written/s, but the maximum await that I saw was 28ms (!). Any attempts to reduce the time (I/O schedulers, disabling bgwriter, increasing number of checkpoints, decreasing shared buffers, disabling read cache on the card etc.) didn't help. After some 3-5m there occurs a COMMIT which takes 100-10000x longer time than usual. Setting fsynch to off Temporarily improved the COMMIT times considerably but I fear to have this option off all the time. Is anybody else using the same RAID card? I suspect the problem lies somewhere between the aacraid module and the card. The aacraid module ignores setting of the 'cache' parameter to 3 -- this should completely disable the SYNCHRONIZE_CACHE command. Any hints? Thanks, -- Michal Vitecek (fuf@mageo.cz)
On Tue, Oct 6, 2009 at 10:59 AM, Michal Vitecek <fuf@mageo.cz> wrote: > Merlin Moncure wrote: >>On Mon, Oct 5, 2009 at 5:17 AM, Michal Vitecek <fuf@mageo.cz> wrote: >> >>> Could the problem be the HW RAID card? There's ServerRAID 8k with 256MB >>> with write-back enabled. Could it be that its internal cache becomes >>> full and all disk I/O operations are delayed until it writes all >>> changes to hard drives? >> >>that's possible...the red flag is going to be iowait. if your server >>can't keep up with the sync demands for example, you will eventually >>outrun the write cache and you can start to see slow queries. With >>your server though it would take in the hundreds of (write) >>transactions per second to do that minimum. > > The problem is that the server is not loaded in any way. The iowait is > 0.62%, there's only 72 sectors written/s, but the maximum await that I > saw was 28ms (!). Any attempts to reduce the time (I/O schedulers, > disabling bgwriter, increasing number of checkpoints, decreasing shared > buffers, disabling read cache on the card etc.) didn't help. After some > 3-5m there occurs a COMMIT which takes 100-10000x longer time than > usual. Setting fsynch to off Temporarily improved the COMMIT times > considerably but I fear to have this option off all the time. > > Is anybody else using the same RAID card? I suspect the problem lies > somewhere between the aacraid module and the card. The aacraid module > ignores setting of the 'cache' parameter to 3 -- this should completely > disable the SYNCHRONIZE_CACHE command. I think you're right. One thing you can do is leave fsync on but disable synchronous_commit. This is compromise between fsync on/off (data consistent following crash, but you may lose some transactions). We need to know what iowait is at the precise moment you get the long commit time. Throw a top, give it short update interval (like .25 seconds), and watch. merlin
Merlin Moncure wrote: > On Tue, Oct 6, 2009 at 10:59 AM, Michal Vitecek <fuf@mageo.cz> wrote: >> Merlin Moncure wrote: >>> On Mon, Oct 5, 2009 at 5:17 AM, Michal Vitecek <fuf@mageo.cz> wrote: >>> >>>> Could the problem be the HW RAID card? There's ServerRAID 8k with 256MB >>>> with write-back enabled. Could it be that its internal cache becomes >>>> full and all disk I/O operations are delayed until it writes all >>>> changes to hard drives? >>> that's possible...the red flag is going to be iowait. if your server >>> can't keep up with the sync demands for example, you will eventually >>> outrun the write cache and you can start to see slow queries. With >>> your server though it would take in the hundreds of (write) >>> transactions per second to do that minimum. >> The problem is that the server is not loaded in any way. The iowait is >> 0.62%, there's only 72 sectors written/s, but the maximum await that I >> saw was 28ms (!). Any attempts to reduce the time (I/O schedulers, >> disabling bgwriter, increasing number of checkpoints, decreasing shared >> buffers, disabling read cache on the card etc.) didn't help. After some >> 3-5m there occurs a COMMIT which takes 100-10000x longer time than >> usual. Setting fsynch to off Temporarily improved the COMMIT times >> considerably but I fear to have this option off all the time. >> >> Is anybody else using the same RAID card? I suspect the problem lies >> somewhere between the aacraid module and the card. The aacraid module >> ignores setting of the 'cache' parameter to 3 -- this should completely >> disable the SYNCHRONIZE_CACHE command. > > I think you're right. One thing you can do is leave fsync on but > disable synchronous_commit. This is compromise between fsync on/off > (data consistent following crash, but you may lose some transactions). > > We need to know what iowait is at the precise moment you get the long > commit time. Throw a top, give it short update interval (like .25 > seconds), and watch. top(1) has a batch mode (-b) that's useful for sending results to a file. Craig
Merlin Moncure wrote: >On Tue, Oct 6, 2009 at 10:59 AM, Michal Vitecek <fuf@mageo.cz> wrote: >> Merlin Moncure wrote: >>>On Mon, Oct 5, 2009 at 5:17 AM, Michal Vitecek <fuf@mageo.cz> wrote: >>> >>>> Could the problem be the HW RAID card? There's ServerRAID 8k with 256MB >>>> with write-back enabled. Could it be that its internal cache becomes >>>> full and all disk I/O operations are delayed until it writes all >>>> changes to hard drives? >>> >>>that's possible...the red flag is going to be iowait. if your server >>>can't keep up with the sync demands for example, you will eventually >>>outrun the write cache and you can start to see slow queries. With >>>your server though it would take in the hundreds of (write) >>>transactions per second to do that minimum. >> >> The problem is that the server is not loaded in any way. The iowait is >> 0.62%, there's only 72 sectors written/s, but the maximum await that I >> saw was 28ms (!). Any attempts to reduce the time (I/O schedulers, >> disabling bgwriter, increasing number of checkpoints, decreasing shared >> buffers, disabling read cache on the card etc.) didn't help. After some >> 3-5m there occurs a COMMIT which takes 100-10000x longer time than >> usual. Setting fsynch to off Temporarily improved the COMMIT times >> considerably but I fear to have this option off all the time. >> >> Is anybody else using the same RAID card? I suspect the problem lies >> somewhere between the aacraid module and the card. The aacraid module >> ignores setting of the 'cache' parameter to 3 -- this should completely >> disable the SYNCHRONIZE_CACHE command. > >I think you're right. One thing you can do is leave fsync on but >disable synchronous_commit. This is compromise between fsync on/off >(data consistent following crash, but you may lose some transactions). > >We need to know what iowait is at the precise moment you get the long >commit time. Throw a top, give it short update interval (like .25 >seconds), and watch. I'm writing with resolution to the problem: It was indeed caused by the IBM ServerRAID 8k SAS RAID card. Putting the WAL logs onto a separate (not in RAID 5) hard drive helped tremendously with the COMMIT times and the occurrence of the very long SQL query times dropped from 3-5min to ~45min where only INSERT or UPDATE queries were slow. Flashing firmware of the RAID card and of all hard drives fixed the problem altogether. To explain why we waited for so long with the firmware updates was because of the fact that IBM frequently puts a new version on their servers and then, after a day or two, replaces it with a newer version which fixes a critical bug introduced in the previous one. Thanks, -- Michal (fuf@mageo.cz)
On Mon, Oct 12, 2009 at 5:23 AM, Michal Vitecek <fuf@mageo.cz> wrote: > Merlin Moncure wrote: >>On Tue, Oct 6, 2009 at 10:59 AM, Michal Vitecek <fuf@mageo.cz> wrote: >>> Merlin Moncure wrote: >>>>On Mon, Oct 5, 2009 at 5:17 AM, Michal Vitecek <fuf@mageo.cz> wrote: >>>> >>>>> Could the problem be the HW RAID card? There's ServerRAID 8k with 256MB >>>>> with write-back enabled. Could it be that its internal cache becomes >>>>> full and all disk I/O operations are delayed until it writes all >>>>> changes to hard drives? >>>> >>>>that's possible...the red flag is going to be iowait. if your server >>>>can't keep up with the sync demands for example, you will eventually >>>>outrun the write cache and you can start to see slow queries. With >>>>your server though it would take in the hundreds of (write) >>>>transactions per second to do that minimum. >>> >>> The problem is that the server is not loaded in any way. The iowait is >>> 0.62%, there's only 72 sectors written/s, but the maximum await that I >>> saw was 28ms (!). Any attempts to reduce the time (I/O schedulers, >>> disabling bgwriter, increasing number of checkpoints, decreasing shared >>> buffers, disabling read cache on the card etc.) didn't help. After some >>> 3-5m there occurs a COMMIT which takes 100-10000x longer time than >>> usual. Setting fsynch to off Temporarily improved the COMMIT times >>> considerably but I fear to have this option off all the time. >>> >>> Is anybody else using the same RAID card? I suspect the problem lies >>> somewhere between the aacraid module and the card. The aacraid module >>> ignores setting of the 'cache' parameter to 3 -- this should completely >>> disable the SYNCHRONIZE_CACHE command. >> >>I think you're right. One thing you can do is leave fsync on but >>disable synchronous_commit. This is compromise between fsync on/off >>(data consistent following crash, but you may lose some transactions). >> >>We need to know what iowait is at the precise moment you get the long >>commit time. Throw a top, give it short update interval (like .25 >>seconds), and watch. > > I'm writing with resolution to the problem: It was indeed caused by the > IBM ServerRAID 8k SAS RAID card. Putting the WAL logs onto a separate > (not in RAID 5) hard drive helped tremendously with the COMMIT times > and the occurrence of the very long SQL query times dropped from 3-5min > to ~45min where only INSERT or UPDATE queries were slow. Flashing > firmware of the RAID card and of all hard drives fixed the problem > altogether. To explain why we waited for so long with the firmware > updates was because of the fact that IBM frequently puts a new version > on their servers and then, after a day or two, replaces it with a newer > version which fixes a critical bug introduced in the previous one. I noticed similar behavior on a different raid controller (LSI based Dell Perc 5). Things ran ok most of the time, but during periods of moderate load and up sometimes the write back cache on the card will fill up and flash. During this operation the system would become completely unresponsive for 2-20 seconds if fsync was on. Needless to say, on an OLTP system this is completely unacceptable. A patch by the vendor later reduced but did not completely fix the problem. One things about raid controllers I really don't like is that they have a tendency to cause the o/s to lie about iowait...really hurts you from a diagnostic point of view. This is why I've soured a bit on hardware raid as a concept. While the tools/features/bios configuration is all nice, the raid controller is a black box that completely defines the performance if i/o bound systems...that's a little scary. Note I'm not advising to run out and go install software raid everywhere, but these are certainly cautionary tales. merlin