Thread: updating a row in a table with only one row

updating a row in a table with only one row

From
Michal Vitecek
Date:
Hello everyone,

 I'm using PostgreSQL 8.3.8 running on a server with 2 Xeon CPUs, 4GB
 RAM, 4+2 disks in RAID 5 and CentOS 5.3. There's only one database
 which dumped with pgdump takes ~0.5GB.

 There are ~100 tables in the database and one of them (tableOne) always
 contains only a single row. There's one index on it. However performing
 update on the single row (which occurs every 60 secs) takes a
 considerably long time -- around 200ms. The system is not loaded in any
 way.

 The table definition is:

 CREATE TABLE tableOne (
    value1      BIGINT NOT NULL,
    value2      INTEGER NOT NULL,
    value3      INTEGER NOT NULL,
    value4      INTEGER NOT NULL,
    value5      INTEGER NOT NULL,
 );
 CREATE INDEX tableOne_index1 ON tableOne (value5);

 And the SQL query to update the _only_ row in the above table is:
 ('value5' can't be used to identify the row as I don't know it at the
 time)

 UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 = newValue5;

 And this is what EXPLAIN says on the above SQL query:

 DB=> EXPLAIN UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 = newValue5;
 LOG:  duration: 235.948 ms  statement: EXPLAIN UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 =
newValue5;
                        QUERY PLAN
 --------------------------------------------------------
  Seq Scan on jackpot  (cost=0.00..1.01 rows=1 width=14)
 (1 row)

 What takes PostgreSQL so long? I guess I could add a fake 'id' column,
 create an index on it to identify the single row, but still -- the time
 seems quite ridiculous to me.

        Thanks,
--
        Michal        (fuf@mageo.cz)

Re: updating a row in a table with only one row

From
"A. Kretschmer"
Date:
In response to Michal Vitecek :
>  There are ~100 tables in the database and one of them (tableOne) always
>  contains only a single row. There's one index on it. However performing

In this case, only one row, you don't need an index. Really.

>  update on the single row (which occurs every 60 secs) takes a
>  considerably long time -- around 200ms. The system is not loaded in any
>  way.
>
>  UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 = newValue5;
>
>  And this is what EXPLAIN says on the above SQL query:
>
>  DB=> EXPLAIN UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 = newValue5;
>  LOG:  duration: 235.948 ms  statement: EXPLAIN UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 =
newValue5;
>                         QUERY PLAN
>  --------------------------------------------------------
>   Seq Scan on jackpot  (cost=0.00..1.01 rows=1 width=14)
>  (1 row)

tableOne or jackpot?


>
>  What takes PostgreSQL so long? I guess I could add a fake 'id' column,
>  create an index on it to identify the single row, but still -- the time
>  seems quite ridiculous to me.

Maybe a lot of dead tuples, can you show us the output generated from
explain analyse?

I would suggest you to do a 'vacuum full' on this table.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)

Re: updating a row in a table with only one row

From
Hélder M. Vieira
Date:
> There are ~100 tables in the database and one of them (tableOne) always
> contains only a single row. There's one index on it. However performing
> update on the single row (which occurs every 60 secs) takes a
> considerably long time -- around 200ms. The system is not loaded in any
> way.


How often is the the table VACUUMed ?
At the mentioned update rate, the table has a daily growth of 1440 dead row
versions.

Helder M. Vieira




Re: updating a row in a table with only one row

From
Merlin Moncure
Date:
On Fri, Oct 2, 2009 at 4:18 AM, Michal Vitecek <fuf@mageo.cz> wrote:
>  Hello everyone,
>
>  I'm using PostgreSQL 8.3.8 running on a server with 2 Xeon CPUs, 4GB
>  RAM, 4+2 disks in RAID 5 and CentOS 5.3. There's only one database
>  which dumped with pgdump takes ~0.5GB.
>
>  There are ~100 tables in the database and one of them (tableOne) always
>  contains only a single row. There's one index on it. However performing
>  update on the single row (which occurs every 60 secs) takes a
>  considerably long time -- around 200ms. The system is not loaded in any
>  way.
>
>  The table definition is:
>
>  CREATE TABLE tableOne (
>    value1      BIGINT NOT NULL,
>    value2      INTEGER NOT NULL,
>    value3      INTEGER NOT NULL,
>    value4      INTEGER NOT NULL,
>    value5      INTEGER NOT NULL,
>  );
>  CREATE INDEX tableOne_index1 ON tableOne (value5);
>
>  And the SQL query to update the _only_ row in the above table is:
>  ('value5' can't be used to identify the row as I don't know it at the
>  time)
>
>  UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 = newValue5;
>
>  And this is what EXPLAIN says on the above SQL query:
>
>  DB=> EXPLAIN UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 = newValue5;
>  LOG:  duration: 235.948 ms  statement: EXPLAIN UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 =
newValue5;
>                        QUERY PLAN
>  --------------------------------------------------------
>  Seq Scan on jackpot  (cost=0.00..1.01 rows=1 width=14)
>  (1 row)
>
>  What takes PostgreSQL so long? I guess I could add a fake 'id' column,
>  create an index on it to identify the single row, but still -- the time
>  seems quite ridiculous to me.

it is ridiculous.  your problem is almost definitely dead rows.  I
can't recall (and I can't find the info anywhere) if the 'hot' feature
requires an index to be active -- I think it does.  If so, creating a
dummy field and indexing it should resolve the problem.   Can you
confirm the dead row issue by doing vacuum verbose and create the
index?  please respond with your results, I'm curious.  Also, is
autovacuum on?  Have you measured iowait?

merlin

Re: updating a row in a table with only one row

From
Robert Haas
Date:
On Fri, Oct 2, 2009 at 9:54 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Fri, Oct 2, 2009 at 4:18 AM, Michal Vitecek <fuf@mageo.cz> wrote:
>>  Hello everyone,
>>
>>  I'm using PostgreSQL 8.3.8 running on a server with 2 Xeon CPUs, 4GB
>>  RAM, 4+2 disks in RAID 5 and CentOS 5.3. There's only one database
>>  which dumped with pgdump takes ~0.5GB.
>>
>>  There are ~100 tables in the database and one of them (tableOne) always
>>  contains only a single row. There's one index on it. However performing
>>  update on the single row (which occurs every 60 secs) takes a
>>  considerably long time -- around 200ms. The system is not loaded in any
>>  way.
>>
>>  The table definition is:
>>
>>  CREATE TABLE tableOne (
>>    value1      BIGINT NOT NULL,
>>    value2      INTEGER NOT NULL,
>>    value3      INTEGER NOT NULL,
>>    value4      INTEGER NOT NULL,
>>    value5      INTEGER NOT NULL,
>>  );
>>  CREATE INDEX tableOne_index1 ON tableOne (value5);
>>
>>  And the SQL query to update the _only_ row in the above table is:
>>  ('value5' can't be used to identify the row as I don't know it at the
>>  time)
>>
>>  UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 = newValue5;
>>
>>  And this is what EXPLAIN says on the above SQL query:
>>
>>  DB=> EXPLAIN UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 = newValue5;
>>  LOG:  duration: 235.948 ms  statement: EXPLAIN UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 =
newValue5;
>>                        QUERY PLAN
>>  --------------------------------------------------------
>>  Seq Scan on jackpot  (cost=0.00..1.01 rows=1 width=14)
>>  (1 row)
>>
>>  What takes PostgreSQL so long? I guess I could add a fake 'id' column,
>>  create an index on it to identify the single row, but still -- the time
>>  seems quite ridiculous to me.
>
> it is ridiculous.  your problem is almost definitely dead rows.  I
> can't recall (and I can't find the info anywhere) if the 'hot' feature
> requires an index to be active -- I think it does.  If so, creating a
> dummy field and indexing it should resolve the problem.   Can you
> confirm the dead row issue by doing vacuum verbose and create the
> index?  please respond with your results, I'm curious.  Also, is
> autovacuum on?  Have you measured iowait?

Since he's updating all the fields in the table, an index will
certainly ensure that HOT does not apply, no?

...Robert

Re: updating a row in a table with only one row

From
Merlin Moncure
Date:
On Fri, Oct 2, 2009 at 1:39 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Oct 2, 2009 at 9:54 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> it is ridiculous.  your problem is almost definitely dead rows.  I
>> can't recall (and I can't find the info anywhere) if the 'hot' feature
>> requires an index to be active -- I think it does.  If so, creating a
>> dummy field and indexing it should resolve the problem.   Can you
>> confirm the dead row issue by doing vacuum verbose and create the
>> index?  please respond with your results, I'm curious.  Also, is
>> autovacuum on?  Have you measured iowait?
>
> Since he's updating all the fields in the table, an index will
> certainly ensure that HOT does not apply, no?

you're right...I missed that he put an index on value5 (why?).  That's
what's killing him.

merlin

Re: updating a row in a table with only one row

From
Heikki Linnakangas
Date:
Robert Haas wrote:
> On Fri, Oct 2, 2009 at 9:54 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> On Fri, Oct 2, 2009 at 4:18 AM, Michal Vitecek <fuf@mageo.cz> wrote:
>>>  Hello everyone,
>>>
>>>  I'm using PostgreSQL 8.3.8 running on a server with 2 Xeon CPUs, 4GB
>>>  RAM, 4+2 disks in RAID 5 and CentOS 5.3. There's only one database
>>>  which dumped with pgdump takes ~0.5GB.
>>>
>>>  There are ~100 tables in the database and one of them (tableOne) always
>>>  contains only a single row. There's one index on it. However performing
>>>  update on the single row (which occurs every 60 secs) takes a
>>>  considerably long time -- around 200ms. The system is not loaded in any
>>>  way.
>>>
>>>  The table definition is:
>>>
>>>  CREATE TABLE tableOne (
>>>    value1      BIGINT NOT NULL,
>>>    value2      INTEGER NOT NULL,
>>>    value3      INTEGER NOT NULL,
>>>    value4      INTEGER NOT NULL,
>>>    value5      INTEGER NOT NULL,
>>>  );
>>>  CREATE INDEX tableOne_index1 ON tableOne (value5);
>>>
>>>  And the SQL query to update the _only_ row in the above table is:
>>>  ('value5' can't be used to identify the row as I don't know it at the
>>>  time)
>>>
>>>  UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 = newValue5;
>>>
>>>  And this is what EXPLAIN says on the above SQL query:
>>>
>>>  DB=> EXPLAIN UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 = newValue5;
>>>  LOG:  duration: 235.948 ms  statement: EXPLAIN UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5
=newValue5; 
>>>                        QUERY PLAN
>>>  --------------------------------------------------------
>>>  Seq Scan on jackpot  (cost=0.00..1.01 rows=1 width=14)
>>>  (1 row)
>>>
>>>  What takes PostgreSQL so long? I guess I could add a fake 'id' column,
>>>  create an index on it to identify the single row, but still -- the time
>>>  seems quite ridiculous to me.
>> it is ridiculous.  your problem is almost definitely dead rows.  I
>> can't recall (and I can't find the info anywhere) if the 'hot' feature
>> requires an index to be active -- I think it does.  If so, creating a
>> dummy field and indexing it should resolve the problem.   Can you
>> confirm the dead row issue by doing vacuum verbose and create the
>> index?  please respond with your results, I'm curious.  Also, is
>> autovacuum on?  Have you measured iowait?
>
> Since he's updating all the fields in the table, an index will
> certainly ensure that HOT does not apply, no?

An extra index shouldn't hurt if you don't update the indexed dummy
column. But the existing tableOne_index1 will cause HOT to not apply, if
value5 is updated. I'd suggest dropping it (and not creating any other
indexes either), it won't do any good on a table with only one row anyway.

If the table is indeed bloated, VACUUM FULL should shrink it back. I
wonder how it got to be that way, though. Autovacuum should keep a table
like that in check.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Re: updating a row in a table with only one row

From
Michal Vitecek
Date:
Robert Haas wrote:
>On Fri, Oct 2, 2009 at 9:54 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> On Fri, Oct 2, 2009 at 4:18 AM, Michal Vitecek <fuf@mageo.cz> wrote:
>>>  Hello everyone,
>>>
>>>  I'm using PostgreSQL 8.3.8 running on a server with 2 Xeon CPUs, 4GB
>>>  RAM, 4+2 disks in RAID 5 and CentOS 5.3. There's only one database
>>>  which dumped with pgdump takes ~0.5GB.
>>>
>>>  There are ~100 tables in the database and one of them (tableOne) always
>>>  contains only a single row. There's one index on it. However performing
>>>  update on the single row (which occurs every 60 secs) takes a
>>>  considerably long time -- around 200ms. The system is not loaded in any
>>>  way.
>>>
>>>  The table definition is:
>>>
>>>  CREATE TABLE tableOne (
>>>    value1      BIGINT NOT NULL,
>>>    value2      INTEGER NOT NULL,
>>>    value3      INTEGER NOT NULL,
>>>    value4      INTEGER NOT NULL,
>>>    value5      INTEGER NOT NULL,
>>>  );
>>>  CREATE INDEX tableOne_index1 ON tableOne (value5);
>>>
>>>  And the SQL query to update the _only_ row in the above table is:
>>>  ('value5' can't be used to identify the row as I don't know it at the
>>>  time)
>>>
>>>  UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 = newValue5;
>>>
>>>  And this is what EXPLAIN says on the above SQL query:
>>>
>>>  DB=> EXPLAIN UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 = newValue5;
>>>  LOG:  duration: 235.948 ms  statement: EXPLAIN UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5
=newValue5; 
>>>                        QUERY PLAN
>>>  --------------------------------------------------------
>>>  Seq Scan on tableOne  (cost=0.00..1.01 rows=1 width=14)
>>>  (1 row)
>>>
>>>  What takes PostgreSQL so long? I guess I could add a fake 'id' column,
>>>  create an index on it to identify the single row, but still -- the time
>>>  seems quite ridiculous to me.
>>
>> it is ridiculous.  your problem is almost definitely dead rows.  I
>> can't recall (and I can't find the info anywhere) if the 'hot' feature
>> requires an index to be active -- I think it does.  If so, creating a
>> dummy field and indexing it should resolve the problem.   Can you
>> confirm the dead row issue by doing vacuum verbose and create the
>> index?  please respond with your results, I'm curious.  Also, is
>> autovacuum on?  Have you measured iowait?

 Autovacuum is on. I have dropped the superfluous index on value5.

 The following is a result of running vacuum verbose analyze on the
 table after the database has been running for 3 days (it was restored
 from pgdump 3 days ago).

 DB=> vacuum verbose analyze tableOne;
 INFO:  vacuuming "public.tableOne"
 INFO:  "tableOne": found 82 removable, 1 nonremovable row versions in 1 pages
 DETAIL:  0 dead row versions cannot be removed yet.
 There were 141 unused item pointers.
 1 pages contain useful free space.
 0 pages are entirely empty.
 CPU 0.00s/0.00u sec elapsed 0.00 sec.
 INFO:  analyzing "public.tableOne"
 INFO:  "tableOne": scanned 1 of 1 pages, containing 1 live rows and 0 dead rows; 1 rows in sample, 1 estimated total
rows
 LOG:  duration: 23.833 ms  statement: vacuum verbose analyze tableOne;
 VACUUM

 The problem occurs also on different tables but on tableOne this is
 most striking as it is very simple. Also I should mention that the
 problem doesn't occur every time -- but in ~1/6 cases.

 Could the problem be the HW RAID card? There's ServerRAID 8k with 256MB
 with write-back enabled. Could it be that its internal cache becomes
 full and all disk I/O operations are delayed until it writes all
 changes to hard drives?

        Thanks,
--
        Michal Vitecek        (fuf@mageo.cz)

Re: updating a row in a table with only one row

From
Merlin Moncure
Date:
On Mon, Oct 5, 2009 at 5:17 AM, Michal Vitecek <fuf@mageo.cz> wrote:

>  Could the problem be the HW RAID card? There's ServerRAID 8k with 256MB
>  with write-back enabled. Could it be that its internal cache becomes
>  full and all disk I/O operations are delayed until it writes all
>  changes to hard drives?

that's possible...the red flag is going to be iowait. if your server
can't keep up with the sync demands for example, you will eventually
outrun the write cache and you can start to see slow queries.  With
your server though it would take in the hundreds of (write)
transactions per second to do that minimum.

merlin

Re: updating a row in a table with only one row

From
Michal Vitecek
Date:
Merlin Moncure wrote:
>On Mon, Oct 5, 2009 at 5:17 AM, Michal Vitecek <fuf@mageo.cz> wrote:
>
>>  Could the problem be the HW RAID card? There's ServerRAID 8k with 256MB
>>  with write-back enabled. Could it be that its internal cache becomes
>>  full and all disk I/O operations are delayed until it writes all
>>  changes to hard drives?
>
>that's possible...the red flag is going to be iowait. if your server
>can't keep up with the sync demands for example, you will eventually
>outrun the write cache and you can start to see slow queries.  With
>your server though it would take in the hundreds of (write)
>transactions per second to do that minimum.

 The problem is that the server is not loaded in any way. The iowait is
 0.62%, there's only 72 sectors written/s, but the maximum await that I
 saw was 28ms (!). Any attempts to reduce the time (I/O schedulers,
 disabling bgwriter, increasing number of checkpoints, decreasing shared
 buffers, disabling read cache on the card etc.) didn't help. After some
 3-5m there occurs a COMMIT which takes 100-10000x longer time than
 usual. Setting fsynch to off Temporarily improved the COMMIT times
 considerably but I fear to have this option off all the time.

 Is anybody else using the same RAID card? I suspect the problem lies
 somewhere between the aacraid module and the card. The aacraid module
 ignores setting of the 'cache' parameter to 3 -- this should completely
 disable the SYNCHRONIZE_CACHE command.

 Any hints?

        Thanks,
--
        Michal Vitecek        (fuf@mageo.cz)

Re: updating a row in a table with only one row

From
Merlin Moncure
Date:
On Tue, Oct 6, 2009 at 10:59 AM, Michal Vitecek <fuf@mageo.cz> wrote:
> Merlin Moncure wrote:
>>On Mon, Oct 5, 2009 at 5:17 AM, Michal Vitecek <fuf@mageo.cz> wrote:
>>
>>>  Could the problem be the HW RAID card? There's ServerRAID 8k with 256MB
>>>  with write-back enabled. Could it be that its internal cache becomes
>>>  full and all disk I/O operations are delayed until it writes all
>>>  changes to hard drives?
>>
>>that's possible...the red flag is going to be iowait. if your server
>>can't keep up with the sync demands for example, you will eventually
>>outrun the write cache and you can start to see slow queries.  With
>>your server though it would take in the hundreds of (write)
>>transactions per second to do that minimum.
>
>  The problem is that the server is not loaded in any way. The iowait is
>  0.62%, there's only 72 sectors written/s, but the maximum await that I
>  saw was 28ms (!). Any attempts to reduce the time (I/O schedulers,
>  disabling bgwriter, increasing number of checkpoints, decreasing shared
>  buffers, disabling read cache on the card etc.) didn't help. After some
>  3-5m there occurs a COMMIT which takes 100-10000x longer time than
>  usual. Setting fsynch to off Temporarily improved the COMMIT times
>  considerably but I fear to have this option off all the time.
>
>  Is anybody else using the same RAID card? I suspect the problem lies
>  somewhere between the aacraid module and the card. The aacraid module
>  ignores setting of the 'cache' parameter to 3 -- this should completely
>  disable the SYNCHRONIZE_CACHE command.

I think you're right.  One thing you can do is leave fsync on but
disable synchronous_commit.  This is compromise between fsync on/off
(data consistent following crash, but you may lose some transactions).

We need to know what iowait is at the precise moment you get the long
commit time.  Throw a top, give it short update interval (like .25
seconds), and watch.

merlin

Re: updating a row in a table with only one row

From
Craig James
Date:
Merlin Moncure wrote:
> On Tue, Oct 6, 2009 at 10:59 AM, Michal Vitecek <fuf@mageo.cz> wrote:
>> Merlin Moncure wrote:
>>> On Mon, Oct 5, 2009 at 5:17 AM, Michal Vitecek <fuf@mageo.cz> wrote:
>>>
>>>>  Could the problem be the HW RAID card? There's ServerRAID 8k with 256MB
>>>>  with write-back enabled. Could it be that its internal cache becomes
>>>>  full and all disk I/O operations are delayed until it writes all
>>>>  changes to hard drives?
>>> that's possible...the red flag is going to be iowait. if your server
>>> can't keep up with the sync demands for example, you will eventually
>>> outrun the write cache and you can start to see slow queries.  With
>>> your server though it would take in the hundreds of (write)
>>> transactions per second to do that minimum.
>>  The problem is that the server is not loaded in any way. The iowait is
>>  0.62%, there's only 72 sectors written/s, but the maximum await that I
>>  saw was 28ms (!). Any attempts to reduce the time (I/O schedulers,
>>  disabling bgwriter, increasing number of checkpoints, decreasing shared
>>  buffers, disabling read cache on the card etc.) didn't help. After some
>>  3-5m there occurs a COMMIT which takes 100-10000x longer time than
>>  usual. Setting fsynch to off Temporarily improved the COMMIT times
>>  considerably but I fear to have this option off all the time.
>>
>>  Is anybody else using the same RAID card? I suspect the problem lies
>>  somewhere between the aacraid module and the card. The aacraid module
>>  ignores setting of the 'cache' parameter to 3 -- this should completely
>>  disable the SYNCHRONIZE_CACHE command.
>
> I think you're right.  One thing you can do is leave fsync on but
> disable synchronous_commit.  This is compromise between fsync on/off
> (data consistent following crash, but you may lose some transactions).
>
> We need to know what iowait is at the precise moment you get the long
> commit time.  Throw a top, give it short update interval (like .25
> seconds), and watch.

top(1) has a batch mode (-b) that's useful for sending results to a file.

Craig

Re: updating a row in a table with only one row

From
Michal Vitecek
Date:
Merlin Moncure wrote:
>On Tue, Oct 6, 2009 at 10:59 AM, Michal Vitecek <fuf@mageo.cz> wrote:
>> Merlin Moncure wrote:
>>>On Mon, Oct 5, 2009 at 5:17 AM, Michal Vitecek <fuf@mageo.cz> wrote:
>>>
>>>>  Could the problem be the HW RAID card? There's ServerRAID 8k with 256MB
>>>>  with write-back enabled. Could it be that its internal cache becomes
>>>>  full and all disk I/O operations are delayed until it writes all
>>>>  changes to hard drives?
>>>
>>>that's possible...the red flag is going to be iowait. if your server
>>>can't keep up with the sync demands for example, you will eventually
>>>outrun the write cache and you can start to see slow queries.  With
>>>your server though it would take in the hundreds of (write)
>>>transactions per second to do that minimum.
>>
>>  The problem is that the server is not loaded in any way. The iowait is
>>  0.62%, there's only 72 sectors written/s, but the maximum await that I
>>  saw was 28ms (!). Any attempts to reduce the time (I/O schedulers,
>>  disabling bgwriter, increasing number of checkpoints, decreasing shared
>>  buffers, disabling read cache on the card etc.) didn't help. After some
>>  3-5m there occurs a COMMIT which takes 100-10000x longer time than
>>  usual. Setting fsynch to off Temporarily improved the COMMIT times
>>  considerably but I fear to have this option off all the time.
>>
>>  Is anybody else using the same RAID card? I suspect the problem lies
>>  somewhere between the aacraid module and the card. The aacraid module
>>  ignores setting of the 'cache' parameter to 3 -- this should completely
>>  disable the SYNCHRONIZE_CACHE command.
>
>I think you're right.  One thing you can do is leave fsync on but
>disable synchronous_commit.  This is compromise between fsync on/off
>(data consistent following crash, but you may lose some transactions).
>
>We need to know what iowait is at the precise moment you get the long
>commit time.  Throw a top, give it short update interval (like .25
>seconds), and watch.

 I'm writing with resolution to the problem: It was indeed caused by the
 IBM ServerRAID 8k SAS RAID card. Putting the WAL logs onto a separate
 (not in RAID 5) hard drive helped tremendously with the COMMIT times
 and the occurrence of the very long SQL query times dropped from 3-5min
 to ~45min where only INSERT or UPDATE queries were slow. Flashing
 firmware of the RAID card and of all hard drives fixed the problem
 altogether. To explain why we waited for so long with the firmware
 updates was because of the fact that IBM frequently puts a new version
 on their servers and then, after a day or two, replaces it with a newer
 version which fixes a critical bug introduced in the previous one.

        Thanks,
--
        Michal        (fuf@mageo.cz)

Re: updating a row in a table with only one row

From
Merlin Moncure
Date:
On Mon, Oct 12, 2009 at 5:23 AM, Michal Vitecek <fuf@mageo.cz> wrote:
> Merlin Moncure wrote:
>>On Tue, Oct 6, 2009 at 10:59 AM, Michal Vitecek <fuf@mageo.cz> wrote:
>>> Merlin Moncure wrote:
>>>>On Mon, Oct 5, 2009 at 5:17 AM, Michal Vitecek <fuf@mageo.cz> wrote:
>>>>
>>>>>  Could the problem be the HW RAID card? There's ServerRAID 8k with 256MB
>>>>>  with write-back enabled. Could it be that its internal cache becomes
>>>>>  full and all disk I/O operations are delayed until it writes all
>>>>>  changes to hard drives?
>>>>
>>>>that's possible...the red flag is going to be iowait. if your server
>>>>can't keep up with the sync demands for example, you will eventually
>>>>outrun the write cache and you can start to see slow queries.  With
>>>>your server though it would take in the hundreds of (write)
>>>>transactions per second to do that minimum.
>>>
>>>  The problem is that the server is not loaded in any way. The iowait is
>>>  0.62%, there's only 72 sectors written/s, but the maximum await that I
>>>  saw was 28ms (!). Any attempts to reduce the time (I/O schedulers,
>>>  disabling bgwriter, increasing number of checkpoints, decreasing shared
>>>  buffers, disabling read cache on the card etc.) didn't help. After some
>>>  3-5m there occurs a COMMIT which takes 100-10000x longer time than
>>>  usual. Setting fsynch to off Temporarily improved the COMMIT times
>>>  considerably but I fear to have this option off all the time.
>>>
>>>  Is anybody else using the same RAID card? I suspect the problem lies
>>>  somewhere between the aacraid module and the card. The aacraid module
>>>  ignores setting of the 'cache' parameter to 3 -- this should completely
>>>  disable the SYNCHRONIZE_CACHE command.
>>
>>I think you're right.  One thing you can do is leave fsync on but
>>disable synchronous_commit.  This is compromise between fsync on/off
>>(data consistent following crash, but you may lose some transactions).
>>
>>We need to know what iowait is at the precise moment you get the long
>>commit time.  Throw a top, give it short update interval (like .25
>>seconds), and watch.
>
>  I'm writing with resolution to the problem: It was indeed caused by the
>  IBM ServerRAID 8k SAS RAID card. Putting the WAL logs onto a separate
>  (not in RAID 5) hard drive helped tremendously with the COMMIT times
>  and the occurrence of the very long SQL query times dropped from 3-5min
>  to ~45min where only INSERT or UPDATE queries were slow. Flashing
>  firmware of the RAID card and of all hard drives fixed the problem
>  altogether. To explain why we waited for so long with the firmware
>  updates was because of the fact that IBM frequently puts a new version
>  on their servers and then, after a day or two, replaces it with a newer
>  version which fixes a critical bug introduced in the previous one.

I noticed similar behavior on a different raid controller (LSI based
Dell Perc 5).  Things ran ok most of the time, but during periods of
moderate load and up sometimes the write back cache on the card will
fill up and flash.  During this operation the system would become
completely unresponsive for 2-20 seconds if fsync was on.  Needless to
say, on an OLTP system this is completely unacceptable.  A patch by
the vendor later reduced but did not completely fix the problem.  One
things about raid controllers I really don't like is that they have a
tendency to cause the o/s to lie about iowait...really hurts you from
a diagnostic point of view.

This is why I've soured a bit on hardware raid as a concept.  While
the tools/features/bios configuration is all nice, the raid controller
is a black box that completely defines the performance if i/o bound
systems...that's a little scary.  Note I'm not advising to run out and
go install software raid everywhere, but these are certainly
cautionary tales.

merlin