Thread: CPU 0.1% IOWAIT 99% for decisonnal queries

CPU 0.1% IOWAIT 99% for decisonnal queries

From
"Patrick Vedrines"
Date:
Hi everyone,
 
I'm developping a web decisonnal application based on
-Red Hat 3 ES
-Postgresql 8.0.1
-Dell poweredge 2850, Ram 2Gb, 2 procs, 3 Ghz, 1Mb cache and 4 disks ext3 10,000 r/mn
I am alone in the box and there is not any crontab.
 
I have 2 databases (A and B) with exactly the same schemas:
-one main table called "aggregate" having no indexes and supporting only SELECT statements (loaded one time a month with a new bundle of datas). Row size # 200 bytes (50 columns of type char(x) or integer) 
-and several small 'reference' tables not shown by the following example for clarity reasons.
-Database A : aggregate contains 2,300,000 records ( 500 Mb)
-Database B : aggregate contains 9,000,000 records ( 2 Gb)
 
There is no index on the aggregate table since the criterias, their number and their scope are freely choosen by the customers.
 
The query :
        select  sum(ca) 
        from aggregate   
        where  (issue_date >= '2004-01' and issue_date <= '2004-02' )
;
takes 5s on database A ( 5mn30s* the first time, probably to fill the cache)
and  21mn* on database B (whatever it is the first time or not).
 
explain shows sequential scan of course:
 Aggregate  (cost=655711.85..655711.85 rows=1 width=4)
   ->  Seq Scan on "aggregate"  (cost=0.00..647411.70 rows=3320060 width=4)
         Filter: ((issue_date >= '2004-01'::bpchar) AND (issue_date <= '2004-02'::bpchar))
 
*Here is the 'top' display for these response times:
91 processes: 90 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total    0,0%    0,0%    0,2%   0,1%     0,0%   48,6%   51,0%
           cpu00    0,0%    0,0%    0,0%   0,0%     0,0%    0,0%  100,0%
           cpu01    0,0%    0,0%    1,0%   0,0%     0,0%   99,0%    0,0%
           cpu02    0,0%    0,0%    0,0%   0,5%     0,0%    0,0%   99,5%
           cpu03    0,0%    0,0%    0,0%   0,0%     0,0%   95,5%    4,5%
Mem:  2061424k av, 2043944k used,   17480k free,       0k shrd,    6104k buff
                   1551692k actv,  172496k in_d,   30452k in_c
Swap: 2096440k av,       0k used, 2096440k free                 1792852k cached
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
21983 postgres  20   0  9312 9312  8272 D     0,2  0,4   0:00   1 postmaster
    1 root      15   0   488  488   432 S     0,0  0,0   0:06   2 init
    2 root      RT   0     0    0     0 SW    0,0  0,0   0:00   0 migration/0
For the 5s response time, the 'top' command shows 0% iowait and 25% cpu.
 
 
- I guess this is a cache issue but how can I manage/control it ?
Is Postgres managing it's own cache or does it use the OS cache ?
 
- Is using the cache is a good approach?
It does not seem to work for large databases : I tryed several different values for postgres.conf and /proc/sys/kernel/shmmax without detecting any response time enhancement (For example : shared_buffers = 190000 , sort_mem = 4096 , effective_cache_size = 37000 and kernel/shmmax=1200000000 )
Do I have to upgrade the RAM to 6Gb or/and buy faster HD (of what type?) ?
Moreover, a query on database B will destroy the cache previously build for database A, increasing the response time for the next query on database A. And I have in fact 15 databases !
 
- In my case, what should be the best parameters combination between postgres.conf and /proc/sys/kernel/shmmax ?
 
- is there a way to reduce the size of the "aggregate" table files (1Gb + 1Gb + 1 Gb + 0.8Gb = 3.8Gb for the "aggregate" table instead of 2Gb = 200 * 9,000,000 records) by playing with the data types or others parameters (fillfactor ?).
Vacuum (even full) seems to be useless since the aggregate table supports only 'copy aggregate from' and 'select'.
 
- is it possible to define a sort of RAM filesystem (as it exists in DOS/Windows) which I could create and populate my databases into ? ...since the databases does not support updates for this application.
 
Sorry for my naive questions and my poor english but any help or advise will be greatly appreciated !
 
Patrick Vedrines
 
PS (maybe of interest for some users like me) :
I created a partition on a new similar disk but on the last cylinders (near the periphery) and copied the database B into it: the response time is 25% faster (i.e. 15mn instead of 21mn). But 15 mn is still too long for my customers (5 mn would be nice).
 
 
 
 

Re: CPU 0.1% IOWAIT 99% for decisonnal queries

From
Gustavo F Nobrega - Planae
Date:
Hi Patrick,

    How is configured your disk array? Do you have a Perc 4?

Tip: Use reiserfs instead ext3, raid 0+1 and deadline I/O scheduler in kernel linux 2.6
Atenciosamente,

Gustavo Franklin Nóbrega
Infra-Estrutura e Banco de Dados
Planae Tecnologia da Informação
(+55) 14 3224-3066 Ramal 209
www.planae.com.br


Patrick Vedrines wrote:
Hi everyone,
 
I'm developping a web decisonnal application based on
-Red Hat 3 ES
-Postgresql 8.0.1
-Dell poweredge 2850, Ram 2Gb, 2 procs, 3 Ghz, 1Mb cache and 4 disks ext3 10,000 r/mn
I am alone in the box and there is not any crontab.
 
I have 2 databases (A and B) with exactly the same schemas:
-one main table called "aggregate" having no indexes and supporting only SELECT statements (loaded one time a month with a new bundle of datas). Row size # 200 bytes (50 columns of type char(x) or integer) 
-and several small 'reference' tables not shown by the following example for clarity reasons.
-Database A : aggregate contains 2,300,000 records ( 500 Mb)
-Database B : aggregate contains 9,000,000 records ( 2 Gb)
 
There is no index on the aggregate table since the criterias, their number and their scope are freely choosen by the customers.
 
The query :
        select  sum(ca) 
        from aggregate   
        where  (issue_date >= '2004-01' and issue_date <= '2004-02' )
;
takes 5s on database A ( 5mn30s* the first time, probably to fill the cache)
and  21mn* on database B (whatever it is the first time or not).
 
explain shows sequential scan of course:
 Aggregate  (cost=655711.85..655711.85 rows=1 width=4)
   ->  Seq Scan on "aggregate"  (cost=0.00..647411.70 rows=3320060 width=4)
         Filter: ((issue_date >= '2004-01'::bpchar) AND (issue_date <= '2004-02'::bpchar))
 
*Here is the 'top' display for these response times:
91 processes: 90 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total    0,0%    0,0%    0,2%   0,1%     0,0%   48,6%   51,0%
           cpu00    0,0%    0,0%    0,0%   0,0%     0,0%    0,0%  100,0%
           cpu01    0,0%    0,0%    1,0%   0,0%     0,0%   99,0%    0,0%
           cpu02    0,0%    0,0%    0,0%   0,5%     0,0%    0,0%   99,5%
           cpu03    0,0%    0,0%    0,0%   0,0%     0,0%   95,5%    4,5%
Mem:  2061424k av, 2043944k used,   17480k free,       0k shrd,    6104k buff
                   1551692k actv,  172496k in_d,   30452k in_c
Swap: 2096440k av,       0k used, 2096440k free                 1792852k cached
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
21983 postgres  20   0  9312 9312  8272 D     0,2  0,4   0:00   1 postmaster
    1 root      15   0   488  488   432 S     0,0  0,0   0:06   2 init
    2 root      RT   0     0    0     0 SW    0,0  0,0   0:00   0 migration/0
For the 5s response time, the 'top' command shows 0% iowait and 25% cpu.
 
 
- I guess this is a cache issue but how can I manage/control it ?
Is Postgres managing it's own cache or does it use the OS cache ?
 
- Is using the cache is a good approach?
It does not seem to work for large databases : I tryed several different values for postgres.conf and /proc/sys/kernel/shmmax without detecting any response time enhancement (For example : shared_buffers = 190000 , sort_mem = 4096 , effective_cache_size = 37000 and kernel/shmmax=1200000000 )
Do I have to upgrade the RAM to 6Gb or/and buy faster HD (of what type?) ?
Moreover, a query on database B will destroy the cache previously build for database A, increasing the response time for the next query on database A. And I have in fact 15 databases !
 
- In my case, what should be the best parameters combination between postgres.conf and /proc/sys/kernel/shmmax ?
 
- is there a way to reduce the size of the "aggregate" table files (1Gb + 1Gb + 1 Gb + 0.8Gb = 3.8Gb for the "aggregate" table instead of 2Gb = 200 * 9,000,000 records) by playing with the data types or others parameters (fillfactor ?).
Vacuum (even full) seems to be useless since the aggregate table supports only 'copy aggregate from' and 'select'.
 
- is it possible to define a sort of RAM filesystem (as it exists in DOS/Windows) which I could create and populate my databases into ? ...since the databases does not support updates for this application.
 
Sorry for my naive questions and my poor english but any help or advise will be greatly appreciated !
 
Patrick Vedrines
 
PS (maybe of interest for some users like me) :
I created a partition on a new similar disk but on the last cylinders (near the periphery) and copied the database B into it: the response time is 25% faster (i.e. 15mn instead of 21mn). But 15 mn is still too long for my customers (5 mn would be nice).
 
 
 
 

Re: CPU 0.1% IOWAIT 99% for decisonnal queries

From
Richard Huxton
Date:
Patrick Vedrines wrote:
> Hi everyone,
>
> I'm developping a web decisonnal application based on -Red Hat 3 ES
> -Postgresql 8.0.1 -Dell poweredge 2850, Ram 2Gb, 2 procs, 3 Ghz, 1Mb
> cache and 4 disks ext3 10,000 r/mn I am alone in the box and there is
> not any crontab.
>
> I have 2 databases (A and B) with exactly the same schemas: -one main
> table called "aggregate" having no indexes and supporting only SELECT
> statements (loaded one time a month with a new bundle of datas).

Perhaps look into clustering the tables.

 > Row
> size # 200 bytes (50 columns of type char(x) or integer) -and several
> small 'reference' tables not shown by the following example for
> clarity reasons. -Database A : aggregate contains 2,300,000 records (
> 500 Mb) -Database B : aggregate contains 9,000,000 records ( 2 Gb)
>
> There is no index on the aggregate table since the criterias, their
> number and their scope are freely choosen by the customers.

Hmm... not convinced this is a good idea.

> The query : select  sum(ca) from aggregate where  (issue_date >=
> '2004-01' and issue_date <= '2004-02' ); takes 5s on database A (
> 5mn30s* the first time, probably to fill the cache) and  21mn* on
> database B (whatever it is the first time or not).

Because A fits in the cache and B doesn't.

> - I guess this is a cache issue but how can I manage/control it ? Is
> Postgres managing it's own cache or does it use the OS cache ?

Both

> - Is using the cache is a good approach? It does not seem to work for
> large databases : I tryed several different values for postgres.conf
> and /proc/sys/kernel/shmmax without detecting any response time
> enhancement (For example : shared_buffers = 190000 , sort_mem = 4096
> , effective_cache_size = 37000 and kernel/shmmax=1200000000 ) Do I
> have to upgrade the RAM to 6Gb or/and buy faster HD (of what type?) ?
>  Moreover, a query on database B will destroy the cache previously
> build for database A, increasing the response time for the next query
> on database A. And I have in fact 15 databases !

If you don't have any indexes and the table isn't clustered then PG has
no choice but to scan the entire table for every query. As you note,
that's going to destroy your cache. You can increase the RAM but sooner
or later, you'll get the same problem.

> - In my case, what should be the best parameters combination between
> postgres.conf and /proc/sys/kernel/shmmax ?

http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php
http://www.powerpostgresql.com/PerfList

> - is there a way to reduce the size of the "aggregate" table files
> (1Gb + 1Gb + 1 Gb + 0.8Gb = 3.8Gb for the "aggregate" table instead
> of 2Gb = 200 * 9,000,000 records) by playing with the data types or
> others parameters (fillfactor ?). Vacuum (even full) seems to be
> useless since the aggregate table supports only 'copy aggregate from'
> and 'select'.

You can replace int4 with int2 and so on (where possible) but that will
only delay problems.

> - is it possible to define a sort of RAM filesystem (as it exists in
> DOS/Windows) which I could create and populate my databases into ?
> ...since the databases does not support updates for this application.

Won't help - your cache is already doing that. Some things you can do
(in order of effort)

1. Cluster the large tables
2. Analyse your customers' queries and try a couple of indexes - some
choices will be more common than others.
3. Split your tables into two - common fields, uncommon fields, that way
filtering on the common fields might take less space.
4. Split your tables by date, one table per month or year. Then re-write
your customers' queries on-the-fly to select from the right table. Will
only help with queries on date of course.
5. Place each database on its own machine or virtual machine so they
don't interfere with each other.

I'd start with items 1,2 and see if that helps though.

PS - it might make sense to have an unusually large shared_mem for PG,
but I'm not familiar enough with the changes in the cache handling in
8.0 to say for sure.
PPS - there are more changes coming for 8.1, but I know even less about
those.

--
   Richard Huxton
   Archonet Ltd

Re: CPU 0.1% IOWAIT 99% for decisonnal queries

From
Simon Riggs
Date:
On Tue, 2005-03-22 at 19:08 +0100, Patrick Vedrines wrote:
> I have 2 databases (A and B) with exactly the same schemas:
> -one main table called "aggregate" having no indexes and supporting
> only SELECT statements (loaded one time a month with a new bundle of
> datas). Row size # 200 bytes (50 columns of type char(x) or integer)
> -and several small 'reference' tables not shown by the following
> example for clarity reasons.
> -Database A : aggregate contains 2,300,000 records ( 500 Mb)
> -Database B : aggregate contains 9,000,000 records ( 2 Gb)

> (For example : shared_buffers = 190000 , sort_mem = 4096 ,
> effective_cache_size = 37000 and kernel/shmmax=1200000000 )
> Do I have to upgrade the RAM to 6Gb or/and buy faster HD (of what
> type?) ?

Setting shared_buffers that high will do you no good at all, as Richard
suggests.

You've got 1.5Gb of shared_buffers and > 2Gb data. In 8.0, the scan will
hardly use the cache at all, nor will it ever, since the data is bigger
than the cache. Notably, the scan of B should NOT spoil the cache for
A...

Priming the cache is quite hard...but not impossible.

What will kill you on a shared_buffers that big is the bgwriter, which
you should turn off by setting bgwriter_maxpages = 0

> PS (maybe of interest for some users like me) :
> I created a partition on a new similar disk but on the last cylinders
> (near the periphery) and copied the database B into it: the response
> time is 25% faster (i.e. 15mn instead of 21mn). But 15 mn is still too
> long for my customers (5 mn would be nice).

Sounds like your disks/layout/something is pretty sick. You don't
mention I/O bandwidth, controller or RAID, so you should look more into
those topics.

On the other hand...just go for more RAM, as you suggest...but you
should create a RAMdisk, rather than use too large
shared_buffers....that way your data is always in RAM, rather than maybe
in RAM.

Best Regards, Simon Riggs


Re: CPU 0.1% IOWAIT 99% for decisonnal queries

From
"Patrick Vedrines"
Date:
Hello Gustavo,
 
Your question seems to say that you suspect a disk issue, and a few hours later, Simon told me "Sounds like your disks/layout/something is pretty sick".
To be clear in my own mind about it, I've just copyed (time cp) the "aggregate" table files (4 Gb) from one disk to an another one: it takes 22 mn !(3 Mb/s).
That seems to demonstrate that Postgres is not the cause of this issue.
I've just untrusted to my system engineer the analysis of my disks...
 
In case I would have to change my disks, do you have any performance figures related to the types you mentionned (reiserfs vs ext3) ?
I don't use RAID since the security is not a concern.
 
Thank a lot for your help !
 
Patrick
 
Hi Patrick,

    How is configured your disk array? Do you have a Perc 4?

Tip: Use reiserfs instead ext3, raid 0+1 and deadline I/O scheduler in kernel linux 2.6
 

Re: CPU 0.1% IOWAIT 99% for decisonnal queries

From
"Patrick Vedrines"
Date:
Hello Simon,
 
> Sounds like your disks/layout/something is pretty sick. You don't
> mention I/O bandwidth, controller or RAID, so you should look more into
> those topics.
Well seen ! (as we say in France).
As I said to Gustavo, your last suspicion took me into a simple disk test: I've just copyed (time cp) the "aggregate" table files (4 Gb) from one disk to an another one: it takes 22 mn !(3 Mb/s).
That seems to demonstrate that Postgres is not the cause of this issue.
I've just untrusted to my system engineer the analysis of my disks...
 
> Setting shared_buffers that high will do you no good at all, as Richard
> suggests.
I (and my own tests) agree my you.

> You've got 1.5Gb of shared_buffers and > 2Gb data. In 8.0, the scan will
> hardly use the cache at all, nor will it ever, since the data is bigger
> than the cache. Notably, the scan of B should NOT spoil the cache for A
Are you sure of that ? Is Postgres able to say to OS: "don't use the cache for this query"?

> Priming the cache is quite hard...but not impossible.
> What will kill you on a shared_buffers that big is the bgwriter, which
> you should turn off by setting bgwriter_maxpages = 0
Is bgwriter concerned as my application applyes only SELECT ?

>
> On the other hand...just go for more RAM, as you suggest...but you
> should create a RAMdisk, rather than use too large
> shared_buffers....that way your data is always in RAM, rather than maybe
> in RAM.
I am not an Linux expert: Is it possible (and how) to create a RAMdisk ?
 
 

Thank a lot for your help !
 
Patrick

Re: CPU 0.1% IOWAIT 99% for decisonnal queries

From
"Patrick Vedrines"
Date:
Hello Richard,

> Perhaps look into clustering the tables.
Good idea : I will try to go further into this way.

> > There is no index on the aggregate table since the criterias, their
> > number and their scope are freely choosen by the customers.
>
> Hmm... not convinced this is a good idea.
Long days ago, when my application used Informix, I've try to index the
aggregate table: It was a nightmare to manage all these indexes (and their
volume) for a uncertain benefit.

> If you don't have any indexes and the table isn't clustered then PG has
> no choice but to scan the entire table for every query. As you note,
> that's going to destroy your cache. You can increase the RAM but sooner
> or later, you'll get the same problem.
I agree with you : You remarks take me not to rely to the cache features.

> 3. Split your tables into two - common fields, uncommon fields, that way
> filtering on the common fields might take less space.
> 4. Split your tables by date, one table per month or year. Then re-write
> your customers' queries on-the-fly to select from the right table. Will
> only help with queries on date of course.
That forces me to rewrite my query generator which is already a very complex
program (in fact the heart of the system)

> 5. Place each database on its own machine or virtual machine so they
> don't interfere with each other.
I'm afraid I don't have the money for that. As Simon and Gustavo suggested,
I will check my SCSI disks first.


Thank a lot for your advises !

Amicalement,

Patrick



Re: CPU 0.1% IOWAIT 99% for decisonnal queries

From
Gustavo Franklin Nóbrega - Planae
Date:
Good day Patrick!
   
    I can help you to design you disk layout for better perform and security. Please, tell me how many disks (and some specs, like capacity and RPM).

    If you want to know more, there is a very interesting article abou benckmark filesystem ( http://linuxgazette.net/102/piszcz.html ). In this article, ReiserFS 3.6, JFS and XFS are in the same level at top depending your application, and ext3 is more slow than others. I believe that version 4 of the ReiserFS is better that version 3.6, but I could not still test it.

    Raid0 (striping, more at http://www.pcguide.com/ref/hdd/perf/raid/levels/singleLevel0-c.html) or Raid 0+1 (stripping + mirror, more at http://www.pcguide.com/ref/hdd/perf/raid/levels/multLevel01-c.html) is very insteresting to postgresql. Raid0 provides only performance, and Raid 0+1 provides performance and security. Take a look at this articles and think about to use Raid (http://www.pcguide.com/ref/hdd/perf/raid/).

    I'm glad to help. Best regards!
Atenciosamente,

Gustavo Franklin Nóbrega
Infraestrutura e Banco de Dados
Planae Tecnologia da Informação
(+55) 14 3106-3514
http://www.planae.com.br



Patrick Vedrines wrote:
Hello Gustavo,
 
Your question seems to say that you suspect a disk issue, and a few hours later, Simon told me "Sounds like your disks/layout/something is pretty sick".
To be clear in my own mind about it, I've just copyed (time cp) the "aggregate" table files (4 Gb) from one disk to an another one: it takes 22 mn !(3 Mb/s).
That seems to demonstrate that Postgres is not the cause of this issue.
I've just untrusted to my system engineer the analysis of my disks...
 
In case I would have to change my disks, do you have any performance figures related to the types you mentionned (reiserfs vs ext3) ?
I don't use RAID since the security is not a concern.
 
Thank a lot for your help !
 
Patrick
 
Hi Patrick,

    How is configured your disk array? Do you have a Perc 4?

Tip: Use reiserfs instead ext3, raid 0+1 and deadline I/O scheduler in kernel linux 2.6
 

Re: CPU 0.1% IOWAIT 99% for decisonnal queries

From
"Patrick Vedrines"
Date:
Great !
 
I'm not an expert but as far as I know, my 15 databases are spread over 4 SCSI RAID disks 73 GB 10K RPM mounted under ext3 mode. 
I remember that they where provided by DELL under RAID5 and I asked my system engineer for switching them to standard SCSI because I don't care about security but only about speed and capacity ( maybe this switch was not set properly at this time...).
 
Thank you for these interesting links: I 've sent them to my system engineer with my two hands !
 
Amicalement
 
Patrick
----- Original Message -----
Sent: Thursday, March 24, 2005 12:52 PM
Subject: Re: [PERFORM] CPU 0.1% IOWAIT 99% for decisonnal queries

Good day Patrick!
   
    I can help you to design you disk layout for better perform and security. Please, tell me how many disks (and some specs, like capacity and RPM).

    If you want to know more, there is a very interesting article abou benckmark filesystem ( http://linuxgazette.net/102/piszcz.html ). In this article, ReiserFS 3.6, JFS and XFS are in the same level at top depending your application, and ext3 is more slow than others. I believe that version 4 of the ReiserFS is better that version 3.6, but I could not still test it.

    Raid0 (striping, more at http://www.pcguide.com/ref/hdd/perf/raid/levels/singleLevel0-c.html) or Raid 0+1 (stripping + mirror, more at http://www.pcguide.com/ref/hdd/perf/raid/levels/multLevel01-c.html) is very insteresting to postgresql. Raid0 provides only performance, and Raid 0+1 provides performance and security. Take a look at this articles and think about to use Raid (http://www.pcguide.com/ref/hdd/perf/raid/).

    I'm glad to help. Best regards!
Atenciosamente,

Gustavo Franklin Nóbrega
Infraestrutura e Banco de Dados
Planae Tecnologia da Informação
(+55) 14 3106-3514
http://www.planae.com.br



Patrick Vedrines wrote:
Hello Gustavo,
 
Your question seems to say that you suspect a disk issue, and a few hours later, Simon told me "Sounds like your disks/layout/something is pretty sick".
To be clear in my own mind about it, I've just copyed (time cp) the "aggregate" table files (4 Gb) from one disk to an another one: it takes 22 mn !(3 Mb/s).
That seems to demonstrate that Postgres is not the cause of this issue.
I've just untrusted to my system engineer the analysis of my disks...
 
In case I would have to change my disks, do you have any performance figures related to the types you mentionned (reiserfs vs ext3) ?
I don't use RAID since the security is not a concern.
 
Thank a lot for your help !
 
Patrick
 
Hi Patrick,

    How is configured your disk array? Do you have a Perc 4?

Tip: Use reiserfs instead ext3, raid 0+1 and deadline I/O scheduler in kernel linux 2.6
 

Re: CPU 0.1% IOWAIT 99% for decisonnal queries

From
Gustavo Franklin Nóbrega - Planae
Date:
Hi!

    I have a Dell PowerEdge 2600 with a Perc 4/DI and 4 scsi disks 35GB. I have made a array raid 0+1 with 4 disks, because is mission critical application. But, for your,  you can configure a raid0, thats is faster than raid5 for 4 disks.

    Ask to your system enginner what is distribution of Linux is used, kernel version and if he does a kernel tuning for your hardware and application.

Best regards.
Atenciosamente,

Gustavo Franklin Nóbrega
Infraestrutura e Banco de Dados
Planae Tecnologia da Informação
(+55) 14 3106-3514
http://www.planae.com.br



Patrick Vedrines wrote:
Great !
 
I'm not an expert but as far as I know, my 15 databases are spread over 4 SCSI RAID disks 73 GB 10K RPM mounted under ext3 mode. 
I remember that they where provided by DELL under RAID5 and I asked my system engineer for switching them to standard SCSI because I don't care about security but only about speed and capacity ( maybe this switch was not set properly at this time...).
 
Thank you for these interesting links: I 've sent them to my system engineer with my two hands !
 
Amicalement
 
Patrick
----- Original Message -----
Sent: Thursday, March 24, 2005 12:52 PM
Subject: Re: [PERFORM] CPU 0.1% IOWAIT 99% for decisonnal queries

Good day Patrick!
   
    I can help you to design you disk layout for better perform and security. Please, tell me how many disks (and some specs, like capacity and RPM).

    If you want to know more, there is a very interesting article abou benckmark filesystem ( http://linuxgazette.net/102/piszcz.html ). In this article, ReiserFS 3.6, JFS and XFS are in the same level at top depending your application, and ext3 is more slow than others. I believe that version 4 of the ReiserFS is better that version 3.6, but I could not still test it.

    Raid0 (striping, more at http://www.pcguide.com/ref/hdd/perf/raid/levels/singleLevel0-c.html) or Raid 0+1 (stripping + mirror, more at http://www.pcguide.com/ref/hdd/perf/raid/levels/multLevel01-c.html) is very insteresting to postgresql. Raid0 provides only performance, and Raid 0+1 provides performance and security. Take a look at this articles and think about to use Raid (http://www.pcguide.com/ref/hdd/perf/raid/).

    I'm glad to help. Best regards!
Atenciosamente,

Gustavo Franklin Nóbrega
Infraestrutura e Banco de Dados
Planae Tecnologia da Informação
(+55) 14 3106-3514
http://www.planae.com.br
   


Patrick Vedrines wrote:
Hello Gustavo,
 
Your question seems to say that you suspect a disk issue, and a few hours later, Simon told me "Sounds like your disks/layout/something is pretty sick".
To be clear in my own mind about it, I've just copyed (time cp) the "aggregate" table files (4 Gb) from one disk to an another one: it takes 22 mn !(3 Mb/s).
That seems to demonstrate that Postgres is not the cause of this issue.
I've just untrusted to my system engineer the analysis of my disks...
 
In case I would have to change my disks, do you have any performance figures related to the types you mentionned (reiserfs vs ext3) ?
I don't use RAID since the security is not a concern.
 
Thank a lot for your help !
 
Patrick
 
Hi Patrick,

    How is configured your disk array? Do you have a Perc 4?

Tip: Use reiserfs instead ext3, raid 0+1 and deadline I/O scheduler in kernel linux 2.6
 

Re: CPU 0.1% IOWAIT 99% for decisonnal queries

From
Simon Riggs
Date:
On Thu, 2005-03-24 at 10:48 +0100, Patrick Vedrines wrote:
> > You've got 1.5Gb of shared_buffers and > 2Gb data. In 8.0, the scan
> will
> > hardly use the cache at all, nor will it ever, since the data is
> bigger
> > than the cache. Notably, the scan of B should NOT spoil the cache
> for A
> Are you sure of that ? Is Postgres able to say to OS: "don't use the
> cache for this query"?

PostgreSQL 8.0 has the ARC algorithm which prevents cache spoiling of
the shared_buffers, but has no direct influence over the OS cache.

> > Priming the cache is quite hard...but not impossible.
> > What will kill you on a shared_buffers that big is the bgwriter,
> which
> > you should turn off by setting bgwriter_maxpages = 0
> Is bgwriter concerned as my application applyes only SELECT ?

With very large shared_buffers the bgwriter's default settings are a
problem. You don't need it, so I suggest turning it off.

Best Regards, Simon Riggs