Home > mailing lists

Thread: High cpu usage after many inserts

High cpu usage after many inserts

From

Jordan Tomkinson

Date:

23 February 2009, 05:55:51

Hi list,

We are running postgresql 8.3.5 and are trying to stress test our LMS.
The problem is when our stress tester (Jmeter) inserts around 10,000 rows (in 3 hours) over 2 tables (5000 rows each table) the CPU of the sql server hits 100% over all 4 cores for all future inserts.

I have tried numerous things to get the cpu back down but so far the only thing that works is deleting the 10,000 rows Jmeter inserted.

For more information on the problem along with a time stamped list of test results and outcomes please see http://spreadsheets.google.com/pub?key=pu_k0R6vNvOVP26TRZdtdYw

Any help would be appreciated

Regards,

Jordan Tomkinson
System Administrator
Moodle HQ

Re: High cpu usage after many inserts

From

Scott Marlowe

Date:

23 February 2009, 06:02:21

On Sun, Feb 22, 2009 at 11:55 PM, Jordan Tomkinson <jordan@moodle.com> wrote:
> Hi list,
>
> We are running postgresql 8.3.5 and are trying to stress test our LMS.
> The problem is when our stress tester (Jmeter) inserts around 10,000 rows
> (in 3 hours) over 2 tables (5000 rows each table) the CPU of the sql server
> hits 100% over all 4 cores for all future inserts.
>
> I have tried numerous things to get the cpu back down but so far the only
> thing that works is deleting the 10,000 rows Jmeter inserted.
>
> For more information on the problem along with a time stamped list of test
> results and outcomes please see
> http://spreadsheets.google.com/pub?key=pu_k0R6vNvOVP26TRZdtdYw

Can you post the jmeter files?  OR create a SQL test case?  I haven't
had this problem myself, so I'm guessing something in your method or
something in your schema is setting something strange off.  OR the
background writer is busy writing all the changes out after the fact
while the database is breathing from the heavy run.  10,000 rows over
three hours isn't really a whole lotta work unless those are really
wide rows.

Oh, what is an LMS?

Re: High cpu usage after many inserts

From

Scott Marlowe

Date:

23 February 2009, 06:03:59

On Sun, Feb 22, 2009 at 11:55 PM, Jordan Tomkinson <jordan@moodle.com> wrote:
> Hi list,
>
> We are running postgresql 8.3.5 and are trying to stress test our LMS.
> The problem is when our stress tester (Jmeter) inserts around 10,000 rows
> (in 3 hours) over 2 tables (5000 rows each table) the CPU of the sql server
> hits 100% over all 4 cores for all future inserts.

And just to clarify, this is user / system CPU usage, not IO wait, right?

Re: High cpu usage after many inserts

From

Scott Marlowe

Date:

23 February 2009, 06:08:42

One last thing.  You were doing vacuum fulls but NOT reindexing, right?

I quote from the document at google docs:
13:50:00    vacuum full & analyze on all databases through pgadmin

1: Do you have evidence that regular autovacuum isn't keeping up?
2: If you have such evidence, and you have to vacuum full, vacuum full
doesn't really shrink indexes all that well.

For a heavily updated database, the 1, 2, 3 punch of autovacuum
(adjusted properly!), the background writer (adjusted properly)
smoothing things out, and the HOT updates reusing all that space
autovacuum is constantly reclaiming, meaning you should be able to
avoid routine vacuum fulls.  It's made a huge difference in db
maintenance for me.

Still I do find myself in vacuum full territory once or twice a year
(rogue update or something like that on a live database).  If you do
have to vacuum full then reindex.  OR cluster on your favorite index.

Re: High cpu usage after many inserts

From

Jordan Tomkinson

Date:

23 February 2009, 06:16:38

On Mon, Feb 23, 2009 at 4:03 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

On Sun, Feb 22, 2009 at 11:55 PM, Jordan Tomkinson <jordan@moodle.com> wrote:
> Hi list,
>
> We are running postgresql 8.3.5 and are trying to stress test our LMS.
> The problem is when our stress tester (Jmeter) inserts around 10,000 rows
> (in 3 hours) over 2 tables (5000 rows each table) the CPU of the sql server
> hits 100% over all 4 cores for all future inserts.

And just to clarify, this is user / system CPU usage, not IO wait, right?

I am unable to post the jmeter file as it contains sensitive user/pass details, but they simply login to a forum and create a new forum post, then logout.
SQL wise this performs several SELECT's and 3 INSERT'S over 3 different tables.

How does one create an SQL test case?
LMS is Learning Management System, in this case Moodle (moodle.org)

Yes this is user space CPU usage.

Running iostat -k 2 shows:
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 31.50 0.00 456.00 0 912

so not alot of disk writes.

Re: High cpu usage after many inserts

From

Jordan Tomkinson

Date:

23 February 2009, 06:18:36

On Mon, Feb 23, 2009 at 4:08 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

One last thing. You were doing vacuum fulls but NOT reindexing, right?

I quote from the document at google docs:
13:50:00 vacuum full & analyze on all databases through pgadmin

1: Do you have evidence that regular autovacuum isn't keeping up?
2: If you have such evidence, and you have to vacuum full, vacuum full
doesn't really shrink indexes all that well.

For a heavily updated database, the 1, 2, 3 punch of autovacuum
(adjusted properly!), the background writer (adjusted properly)
smoothing things out, and the HOT updates reusing all that space
autovacuum is constantly reclaiming, meaning you should be able to
avoid routine vacuum fulls. It's made a huge difference in db
maintenance for me.

Still I do find myself in vacuum full territory once or twice a year
(rogue update or something like that on a live database). If you do
have to vacuum full then reindex. OR cluster on your favorite index.

I have no evidence of autovacuum not working, the manual full was done for purpose of elimination.

Re: High cpu usage after many inserts

From

Scott Marlowe

Date:

23 February 2009, 06:20:38

On Mon, Feb 23, 2009 at 12:18 AM, Jordan Tomkinson <jordan@moodle.com> wrote:
>
>
> On Mon, Feb 23, 2009 at 4:08 PM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> One last thing.  You were doing vacuum fulls but NOT reindexing, right?
>>
>> I quote from the document at google docs:
>> 13:50:00        vacuum full & analyze on all databases through pgadmin
>>
>> 1: Do you have evidence that regular autovacuum isn't keeping up?
>> 2: If you have such evidence, and you have to vacuum full, vacuum full
>> doesn't really shrink indexes all that well.
>>
>> For a heavily updated database, the 1, 2, 3 punch of autovacuum
>> (adjusted properly!), the background writer (adjusted properly)
>> smoothing things out, and the HOT updates reusing all that space
>> autovacuum is constantly reclaiming, meaning you should be able to
>> avoid routine vacuum fulls.  It's made a huge difference in db
>> maintenance for me.
>>
>> Still I do find myself in vacuum full territory once or twice a year
>> (rogue update or something like that on a live database).  If you do
>> have to vacuum full then reindex.  OR cluster on your favorite index.
>
> I have no evidence of autovacuum not working, the manual full was done for
> purpose of elimination.

Oh, ok.  If you're trying to make a fair benchmark, you should
probably reindex after vacuum full.

Re: High cpu usage after many inserts

From

Scott Marlowe

Date:

23 February 2009, 06:29:39

Oh yeah, what OS is this?  Version and all that.

Re: High cpu usage after many inserts

From

Jordan Tomkinson

Date:

23 February 2009, 06:37:01

On Mon, Feb 23, 2009 at 4:29 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

Oh yeah, what OS is this? Version and all that.

Red Hat Enterprise Linux 5.3 x64 kernel 2.6.18-128.el5

os and hardware details are in the google spreadsheet, you might have to refresh it.

Im working on getting the SQL log for you now.

Re: High cpu usage after many inserts

From

Jordan Tomkinson

Date:

23 February 2009, 06:42:38

On Mon, Feb 23, 2009 at 4:29 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

Oh yeah, what OS is this? Version and all that.

I should probably clarify that the high cpu only exists while the jmeter tests are running, once the tests are finished the cpu returns to 0% (this isnt a production server yet, so no other queries other than my tests)
I have not yet tried other SQL queries to see if they are affected, i suspect it may only be related to the two forum tables the test focuses on but I may be incorrect - the database is filling up with data again now so I can test this tomorrow.

Re: High cpu usage after many inserts

From

Jordan Tomkinson

Date:

23 February 2009, 07:17:27

Scott,

DB Schema: http://demo.moodle.org/db_schema.txt
SQL Query log: http://demo.moodle.org/querylog.txt

There are _much_ more queries than I anticipated :/

Jordan

Re: High cpu usage after many inserts

From

Scott Marlowe

Date:

23 February 2009, 07:20:40

On Mon, Feb 23, 2009 at 12:42 AM, Jordan Tomkinson <jordan@moodle.com> wrote:
>
>
> On Mon, Feb 23, 2009 at 4:29 PM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> Oh yeah, what OS is this?  Version and all that.
>
> I should probably clarify that the high cpu only exists while the jmeter
> tests are running, once the tests are finished the cpu returns to 0% (this
> isnt a production server yet, so no other queries other than my tests)
> I have not yet tried other SQL queries to see if they are affected, i
> suspect it may only be related to the two forum tables the test focuses on
> but I may be incorrect - the database is filling up with data again now so I
> can test this tomorrow.

Sorry, I had gotten the impression the CPU usage continued after the
test.  That it's 100% during the test is quite understandable.  So
does it start lower than 4x100% Then climb during the tests?  Is the
throughput dropping off over time?

Re: High cpu usage after many inserts

From

Jordan Tomkinson

Date:

23 February 2009, 07:30:20

On Mon, Feb 23, 2009 at 5:20 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

On Mon, Feb 23, 2009 at 12:42 AM, Jordan Tomkinson <jordan@moodle.com> wrote:
>
>
> On Mon, Feb 23, 2009 at 4:29 PM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> Oh yeah, what OS is this? Version and all that.
>
> I should probably clarify that the high cpu only exists while the jmeter
> tests are running, once the tests are finished the cpu returns to 0% (this
> isnt a production server yet, so no other queries other than my tests)
> I have not yet tried other SQL queries to see if they are affected, i
> suspect it may only be related to the two forum tables the test focuses on
> but I may be incorrect - the database is filling up with data again now so I
> can test this tomorrow.

Sorry, I had gotten the impression the CPU usage continued after the
test. That it's 100% during the test is quite understandable. So
does it start lower than 4x100% Then climb during the tests? Is the
throughput dropping off over time?

As per the spreadsheet (http://spreadsheets.google.com/pub?key=pu_k0R6vNvOVP26TRZdtdYw) CPU usage is around 50% and starts climbing over 3 hours until we have just under 10,000 rows of data then stays at 99% for the duration of all future tests.
Once the rows are removed the tests start back down at 50% usage again.

Re: High cpu usage after many inserts

From

Scott Marlowe

Date:

23 February 2009, 14:35:59

On Mon, Feb 23, 2009 at 1:29 AM, Jordan Tomkinson <jordan@moodle.com> wrote:
>
> On Mon, Feb 23, 2009 at 5:20 PM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> On Mon, Feb 23, 2009 at 12:42 AM, Jordan Tomkinson <jordan@moodle.com>
>> wrote:
>> >
>> >
>> > On Mon, Feb 23, 2009 at 4:29 PM, Scott Marlowe <scott.marlowe@gmail.com>
>> > wrote:
>> >>
>> >> Oh yeah, what OS is this?  Version and all that.
>> >
>> > I should probably clarify that the high cpu only exists while the jmeter
>> > tests are running, once the tests are finished the cpu returns to 0%
>> > (this
>> > isnt a production server yet, so no other queries other than my tests)
>> > I have not yet tried other SQL queries to see if they are affected, i
>> > suspect it may only be related to the two forum tables the test focuses
>> > on
>> > but I may be incorrect - the database is filling up with data again now
>> > so I
>> > can test this tomorrow.
>>
>> Sorry, I had gotten the impression the CPU usage continued after the
>> test.  That it's 100% during the test is quite understandable.  So
>> does it start lower than 4x100% Then climb during the tests?  Is the
>> throughput dropping off over time?
>
> As per the spreadsheet
> (http://spreadsheets.google.com/pub?key=pu_k0R6vNvOVP26TRZdtdYw) CPU usage
> is around 50% and starts climbing over 3 hours until we have just under
> 10,000 rows of data then stays at 99% for the duration of all future tests.
> Once the rows are removed the tests start back down at 50% usage again.

Oh, ok. well that's pretty normal as the indexes grow large enough to
not fit in cache, then not fit in memory, etc...  Are you noticing a
sharp dropoff in performance?

Re: High cpu usage after many inserts

From

Markus Wanner

Date:

23 February 2009, 19:37:53

Hi,

Scott Marlowe wrote:
> Oh, what is an LMS?

A Learning Management System, not to be confused with a CMS, which might
also stand for a Course Management System ;-)

Regards

Markus Wanner

Re: High cpu usage after many inserts

From

Jordan Tomkinson

Date:

23 February 2009, 23:18:38

On Tue, Feb 24, 2009 at 12:35 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

On Mon, Feb 23, 2009 at 1:29 AM, Jordan Tomkinson <jordan@moodle.com> wrote:
> As per the spreadsheet
> (http://spreadsheets.google.com/pub?key=pu_k0R6vNvOVP26TRZdtdYw) CPU usage
> is around 50% and starts climbing over 3 hours until we have just under
> 10,000 rows of data then stays at 99% for the duration of all future tests.
> Once the rows are removed the tests start back down at 50% usage again.

Oh, ok. well that's pretty normal as the indexes grow large enough to
not fit in cache, then not fit in memory, etc... Are you noticing a
sharp dropoff in performance?

Again as per the spreadsheet, you can see the tests normally take around 130 seconds to complete, but after many rows are inserted they start to take upwards of 500 seconds. I can leave the server idle for days (over the weekend infact) and start a new test to reproduce the results (500+ seconds), so i dont think its a delayed write issue (surely 2 days is enough?)

What configuration options should I be looking at to make sure it fits in the cache? I have 8GB available and no matter how much I tweak i cannot get it using any more than 2GB. the DB is almost 4GB in size on disk and as this is a dedicated sql server for just 1 database, id really like to have the whole thing in memory if possible.

Re: High cpu usage after many inserts

From

Greg Smith

Date:

24 February 2009, 01:43:13

On Mon, 23 Feb 2009, Scott Marlowe wrote:

> well that's pretty normal as the indexes grow large enough to not fit in
> cache, then not fit in memory, etc...

Right, the useful thing to do in this case is to take a look at how big
all the relations (tables, indexes) involved are at each of the steps in
the process.  The script at http://wiki.postgresql.org/wiki/Disk_Usage
will show you that.  That will give some feedback on whether the
vacuum/reindex methodology is really doing what you expect, and it will
also let you compare the size of the table/index with how much RAM is in
the system.

Have you done any tuning of the postgresql.conf file?  If you haven't
increased shared_buffers substantially, you could be seeing buffer cache
churn as the CPU spends all its time shuffling buffers between PostgreSQL
and the OS once the working set involved exceeds around 32MB.

Shouldn't someone have ranted about RAID-5 by this point in the thread?

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: High cpu usage after many inserts

From

Jordan Tomkinson

Date:

24 February 2009, 05:26:39

On Tue, Feb 24, 2009 at 11:42 AM, Greg Smith <gsmith@gregsmith.com> wrote:

On Mon, 23 Feb 2009, Scott Marlowe wrote:

well that's pretty normal as the indexes grow large enough to not fit in cache, then not fit in memory, etc...

Right, the useful thing to do in this case is to take a look at how big all the relations (tables, indexes) involved are at each of the steps in the process. The script at http://wiki.postgresql.org/wiki/Disk_Usage will show you that. That will give some feedback on whether the vacuum/reindex methodology is really doing what you expect, and it will also let you compare the size of the table/index with how much RAM is in the system.

Have you done any tuning of the postgresql.conf file? If you haven't increased shared_buffers substantially, you could be seeing buffer cache churn as the CPU spends all its time shuffling buffers between PostgreSQL and the OS once the working set involved exceeds around 32MB.

Shouldn't someone have ranted about RAID-5 by this point in the thread?

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Right, i have done some more testing and I think its pretty conclusive.

1. Start with a known good copy of the database (some 3gb in size)
2. Run the Jmeter tests until ~7000 new rows are inserted equally over 3 tables. At this point performance goes to hell
3. delete the ~7000 rows from the db without re-indexing, (manually) analyzing or anything of the sort.
4. performance instantly returns to that of before the tests began (optimum).

So im thinking as Scott said it could be buffer/cache size filling up?
Here is my postgresql.conf, perhaps someone can make a few pointers.
The hardware is a Quad Xeon 2.0GHZ with 8GB RAM and 15K RPM SAS drives in RAID 5 (i know raid 5, dont tell me)

max_connections = 400
shared_buffers = 2048MB
temp_buffers = 8MB
max_prepared_transactions = 10
work_mem = 8MB
maintenance_work_mem = 128MB
max_stack_depth = 4MB
vacuum_cost_delay = 0
vacuum_cost_page_hit = 1
vacuum_cost_page_miss = 10
vacuum_cost_page_dirty = 20
vacuum_cost_limit = 200
bgwriter_delay = 200ms
bgwriter_lru_maxpages = 100
bgwriter_lru_multiplier = 2.0
fsync = on
synchronous_commit = on
wal_sync_method = fsync
full_page_writes = on
wal_buffers = 128kB
wal_writer_delay = 200ms
commit_delay = 0
commit_siblings = 5
log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 0
track_counts = on
log_parser_stats = off
log_planner_stats = off
log_executor_stats = off
log_statement_stats = off
autovacuum = on
log_autovacuum_min_duration = 0
autovacuum_max_workers = 3
autovacuum_naptime = 1min
autovacuum_vacuum_threshold = 50
autovacuum_analyze_threshold = 50
autovacuum_vacuum_scale_factor = 0.2
autovacuum_analyze_scale_factor = 0.1
autovacuum_freeze_max_age = 200000000
autovacuum_vacuum_cost_delay = 20
autovacuum_vacuum_cost_limit = -1
datestyle = 'iso, mdy'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
default_text_search_config = 'pg_catalog.english'
escape_string_warning = off

Re: High cpu usage after many inserts

From

Jordan Tomkinson

Date:

24 February 2009, 06:40:32

On Tue, Feb 24, 2009 at 11:42 AM, Greg Smith <gsmith@gregsmith.com> wrote:

Right, the useful thing to do in this case is to take a look at how big all the relations (tables, indexes) involved are at each of the steps in the process. The script at http://wiki.postgresql.org/wiki/Disk_Usage will show you that. That will give some feedback on whether the vacuum/reindex methodology is really doing what you expect, and it will also let you compare the size of the table/index with how much RAM is in the system.

taken before the ~7000 rows were entered.

           relation            | size
-------------------------------+--------
public.mdl_log                | 595 MB
public.mdl_forum_posts        | 375 MB
public.mdl_log_coumodact_ix   | 197 MB
public.mdl_user               | 191 MB
public.mdl_cache_text         | 162 MB
public.mdl_log_usecou_ix      | 137 MB
public.mdl_log_act_ix         | 119 MB
public.mdl_log_cmi_ix         | 97 MB
public.mdl_log_tim_ix         | 97 MB
public.mdl_log_id_pk          | 97 MB
public.mdl_question_states    | 48 MB
public.mdl_stats_user_daily   | 48 MB
public.mdl_hotpot_responses   | 47 MB
public.mdl_register_downloads | 45 MB
public.mdl_message_read       | 37 MB
public.mdl_course_display     | 37 MB
public.mdl_stats_user_weekly | 31 MB
public.mdl_mnet_log           | 27 MB
public.mdl_user_ema_ix        | 26 MB
public.mdl_regidown_url_ix    | 23 MB
(20 rows)

Re: High cpu usage after many inserts

From

Scott Marlowe

Date:

24 February 2009, 08:17:19

On Tue, Feb 24, 2009 at 12:40 AM, Jordan Tomkinson <jordan@moodle.com> wrote:
>
>
> On Tue, Feb 24, 2009 at 11:42 AM, Greg Smith <gsmith@gregsmith.com> wrote:
>>
>> Right, the useful thing to do in this case is to take a look at how big
>> all the relations (tables, indexes) involved are at each of the steps in the
>> process.  The script at http://wiki.postgresql.org/wiki/Disk_Usage will show
>> you that.  That will give some feedback on whether the vacuum/reindex
>> methodology is really doing what you expect, and it will also let you
>> compare the size of the table/index with how much RAM is in the system.
>>
>
> taken before the ~7000 rows were entered.
>
>            relation            |  size
> -------------------------------+--------
>  public.mdl_log                | 595 MB
>  public.mdl_forum_posts        | 375 MB
>  public.mdl_log_coumodact_ix   | 197 MB
>  public.mdl_user               | 191 MB
>  public.mdl_cache_text         | 162 MB
>  public.mdl_log_usecou_ix      | 137 MB
>  public.mdl_log_act_ix         | 119 MB
>  public.mdl_log_cmi_ix         | 97 MB
>  public.mdl_log_tim_ix         | 97 MB
>  public.mdl_log_id_pk          | 97 MB
>  public.mdl_question_states    | 48 MB
>  public.mdl_stats_user_daily   | 48 MB
>  public.mdl_hotpot_responses   | 47 MB
>  public.mdl_register_downloads | 45 MB
>  public.mdl_message_read       | 37 MB
>  public.mdl_course_display     | 37 MB
>  public.mdl_stats_user_weekly  | 31 MB
>  public.mdl_mnet_log           | 27 MB
>  public.mdl_user_ema_ix        | 26 MB
>  public.mdl_regidown_url_ix    | 23 MB

What's more interesting is how quickly they grow during your test.
I'm betting that as public.mdl_log and  public.mdl_forum_posts grow,
you get a dataset larger than memory.

There are two levels of caching that pgsql uses, the highest and
closest to pgsql is the shared_buffer cache, and the next is the
kernel level file system cache.    While it's still way faster to hit
the kernel level of file cache than to hit the actual hard drives, the
pg shared_buffers is the fastest.  You may be in a situation where
giving a bit more memory to pg will help, but with a 4G dataset and 8G
of ram you're cutting it close.  You need a few gig for sorts and
processes and such like that.  Going to 16Gig you could set
shared_buffers at somewhere in the 4 to 8Gig range and it might work
out.

If you're looking at scaling to large amounts of data, you can't plan
on it all fitting into memory, and you have to start planning for
faster Disk I/O.  This means more disks, fast RAID controllers with
optional battery backed cache (not really optional) and / or kernel
level RAID, for read mostly stuff it's quite fast.  As expensive as 16
or 24 or 32 fast hard drives are, they're cheaper than servers with a
half terabyte of ram or whatever you'd need for a big dataset.

First things first I'd try increasing shared_buffers to the just over
4G range.  I'd check after each run with vacuum verbose (NOT FULL) to
see how bloated my db was getting.

Re: High cpu usage after many inserts

From

Aidan Van Dyk

Date:

24 February 2009, 14:05:53

* Greg Smith <gsmith@gregsmith.com> [090201 00:00]:

> Shouldn't someone have ranted about RAID-5 by this point in the thread?

What?  Sorry, I wasn't paying attention...

You mean someone's actually still using RAID-5?

;-)

--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Attachment

signature.asc

Re: High cpu usage after many inserts

From

Jordan Tomkinson

Date:

24 February 2009, 23:21:22

On Wed, Feb 25, 2009 at 12:05 AM, Aidan Van Dyk <aidan@highrise.ca> wrote:

* Greg Smith <gsmith@gregsmith.com> [090201 00:00]:

> Shouldn't someone have ranted about RAID-5 by this point in the thread?

What? Sorry, I wasn't paying attention...

You mean someone's actually still using RAID-5?

;-)

What exactly is wrong with RAID5 and what should we have gone with?

Re: High cpu usage after many inserts

From

"Joshua D. Drake"

Date:

24 February 2009, 23:24:07

On Wed, 2009-02-25 at 09:21 +0900, Jordan Tomkinson wrote:
>
> On Wed, Feb 25, 2009 at 12:05 AM, Aidan Van Dyk <aidan@highrise.ca>
> wrote:
>         * Greg Smith <gsmith@gregsmith.com> [090201 00:00]:
>
>         > Shouldn't someone have ranted about RAID-5 by this point in
>         the thread?
>
>
>         What?  Sorry, I wasn't paying attention...
>
>         You mean someone's actually still using RAID-5?
>
>         ;-)
>
> What exactly is wrong with RAID5 and what should we have gone with?

RAID5 outside of RAID 0 is the worst possible RAID level to run with a
database. (of the commonly used raid level's that is).

It is very, very slow on random writes which is what databases do.
Switch to RAID 10.

Sincerely,

Joshua D. Drkae


>
>
--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997

Re: High cpu usage after many inserts

From

Greg Williamson

Date:

24 February 2009, 23:29:19

--- On Wed, 2/25/09, Jordan Tomkinson <jordan@moodle.com> wrote:

<...>

> What exactly is wrong with RAID5 and what should we have
> gone with?

RAID10 is often used. As others have pointed out, it is very slow for random writes. It also has issues that expose
yourdata to total loss, see for instance <http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt>. 

HTH,

Greg Williamson

Re: High cpu usage after many inserts

From

Jordan Tomkinson

Date:

24 February 2009, 23:45:11

On Wed, Feb 25, 2009 at 9:23 AM, Joshua D. Drake <jd@commandprompt.com> wrote:

RAID5 outside of RAID 0 is the worst possible RAID level to run with a
database. (of the commonly used raid level's that is).

It is very, very slow on random writes which is what databases do.
Switch to RAID 10.

surely being (real) hardware raid with 15k rpm disks this wouldn't be a huge issue unless a large amount of data was being written ?

Re: High cpu usage after many inserts

From

Scott Marlowe

Date:

24 February 2009, 23:51:17

On Tue, Feb 24, 2009 at 5:21 PM, Jordan Tomkinson <jordan@moodle.com> wrote:
>
> On Wed, Feb 25, 2009 at 12:05 AM, Aidan Van Dyk <aidan@highrise.ca> wrote:
>>
>> * Greg Smith <gsmith@gregsmith.com> [090201 00:00]:
>>
>> > Shouldn't someone have ranted about RAID-5 by this point in the thread?
>>
>> What?  Sorry, I wasn't paying attention...
>>
>> You mean someone's actually still using RAID-5?
>>
>> ;-)
>
> What exactly is wrong with RAID5 and what should we have gone with?

RAID 5 is only suitable for situations where you need maximum storage
for minimum cost and the database is mostly / all read all the time.
Like large reporting databases.  It's slow on writes, and it has a low
tolerance for dead drives (2 and it's all gone)

HOWEVER.  RAID-10, which is theoretically MUCH better, is only better
if it's implemented right, and lot of cheap RAID controllers don't do
any better running RAID-10.  Many of these can be put into JBOD mode
where you do RAID-10 in the kernel, or you can do RAID-1 on the card
(x sets) And RAID-0 in the kernel.

RAID-10 is almost always the right choice when you're buying good
controllers and fast drives and you want maximum performance.  If you
REALLY need a lot of storage, and you have to use something like RAID
5 at least look at RAID 6.

Re: High cpu usage after many inserts

From

"Joshua D. Drake"

Date:

24 February 2009, 23:59:59

On Wed, 2009-02-25 at 09:44 +0900, Jordan Tomkinson wrote:
>
>
> On Wed, Feb 25, 2009 at 9:23 AM, Joshua D. Drake
> <jd@commandprompt.com> wrote:
>
>
>         RAID5 outside of RAID 0 is the worst possible RAID level to
>         run with a
>         database. (of the commonly used raid level's that is).
>
>         It is very, very slow on random writes which is what databases
>         do.
>         Switch to RAID 10.
>
> surely being (real) hardware raid with 15k rpm disks this wouldn't be
> a huge issue unless a large amount of data was being written ?

Tests done by Mark Wong on a 3 disk 15k scsi versus 4 disk raid 10 scsi
show that RAID 10 is on average 30% faster.


Sincerely,

Joshua D. Drake

>
--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997

Re: High cpu usage after many inserts

From

Jordan Tomkinson

Date:

25 February 2009, 00:48:45

On Tue, Feb 24, 2009 at 4:40 PM, Jordan Tomkinson <jordan@moodle.com> wrote:

taken before the ~7000 rows were entered.

           relation            | size
-------------------------------+--------
public.mdl_log                | 595 MB
public.mdl_forum_posts        | 375 MB
public.mdl_log_coumodact_ix   | 197 MB
public.mdl_user               | 191 MB
public.mdl_cache_text         | 162 MB
public.mdl_log_usecou_ix      | 137 MB
public.mdl_log_act_ix         | 119 MB
public.mdl_log_cmi_ix         | 97 MB
public.mdl_log_tim_ix         | 97 MB
public.mdl_log_id_pk          | 97 MB
public.mdl_question_states    | 48 MB
public.mdl_stats_user_daily   | 48 MB
public.mdl_hotpot_responses   | 47 MB
public.mdl_register_downloads | 45 MB
public.mdl_message_read       | 37 MB
public.mdl_course_display     | 37 MB
public.mdl_stats_user_weekly | 31 MB
public.mdl_mnet_log           | 27 MB
public.mdl_user_ema_ix        | 26 MB
public.mdl_regidown_url_ix    | 23 MB
(20 rows)

Taken after 9000 rows entered, by this stage performance is terrible.
           relation            | size
-------------------------------+--------
public.mdl_log                | 597 MB
public.mdl_forum_posts        | 389 MB
public.mdl_log_coumodact_ix   | 198 MB
public.mdl_user               | 193 MB
public.mdl_cache_text         | 162 MB
public.mdl_log_usecou_ix      | 137 MB
public.mdl_log_act_ix         | 119 MB
public.mdl_log_cmi_ix         | 98 MB
public.mdl_log_tim_ix         | 97 MB
public.mdl_log_id_pk          | 97 MB
public.mdl_question_states    | 48 MB
public.mdl_stats_user_daily   | 48 MB
public.mdl_hotpot_responses   | 47 MB
public.mdl_register_downloads | 45 MB
public.mdl_message_read       | 37 MB
public.mdl_course_display     | 37 MB
public.mdl_stats_user_weekly | 31 MB
public.mdl_mnet_log           | 27 MB
public.mdl_user_ema_ix        | 26 MB
public.mdl_regidown_url_ix    | 23 MB
(20 rows)

as you can see, the tables arent growing by much (only a few mb) so is this really to do with buffer/cache size?
I set shared_buffers to 3072 (from 2048) and it hasnt made much improvement, requests are still taking longer and longer to execute.

Re: High cpu usage after many inserts

From

Ron Mayer

Date:

25 February 2009, 00:56:12

Joshua D. Drake wrote:
> On Wed, 2009-02-25 at 09:21 +0900, Jordan Tomkinson wrote:
>> On Wed, Feb 25, 2009 at 12:05 AM, Aidan Van Dyk <aidan@highrise.ca>
>> wrote:
>>         * Greg Smith <gsmith@gregsmith.com> [090201 00:00]:
>>         > Shouldn't someone have ranted about RAID-5 by this point in
>>         the thread?
>>         You mean someone's actually still using RAID-5?
>>         ;-)
>>
>> What exactly is wrong with RAID5 and what should we have gone with?

On top of the stuff Joshua wrote, there's also the "RAID 5 Write Hole".
Quoting Wikipedia:
"In the event of a system failure while there are active writes, the
 parity of a stripe may become inconsistent with the data. If this is
 not detected and repaired before a disk or block fails, data loss may
 ensue as incorrect parity will be used to reconstruct the missing block
 in that stripe. This potential vulnerability is sometimes known as the
 write hole. Battery-backed cache and similar techniques are commonly
 used to reduce the window of opportunity for this to occur."
And in more detail from http://blogs.sun.com/bonwick/entry/raid_z
"RAID-5 write hole... What's worse, it will do so silently -- it has
 no idea that it's giving you corrupt data."

I sometimes wonder if postgres should refuse to start up
on RAID-5 in the same way it does on VFAT or running root.
:-)

> RAID5 outside of RAID 0 is the worst possible RAID level to run with a
> database. (of the commonly used raid level's that is).
>
> It is very, very slow on random writes which is what databases do.
> Switch to RAID 10.
>
> Sincerely,
>
> Joshua D. Drkae
>
>
>>