Thread: Vacuum Problems

Vacuum Problems

From
"Rafael Domiciano"
Date:
Hi there,

I need some help or just some hints. I am having problems with vacuum full in one table only: "clifatura".
That table has today around 7 million rows.

I scheduled on the server a cron job to run VACUUM FULL every day at 23 P.M, but on the following day, at 8 A.M. vacuum is yet working on "clifatura"; so I have to kill vacuum.

I read something about maintenance_work_mem, so I increased it to 128MB. My server is a Intel Xeon Dual-Core 2.8 Ghz with 2 Gb RAM, running Postgres 8.2.4.

In others times recreating the table is Ok, but it can't be the right thing to do.

I hope anyone have some information.

Thanks,

Rafael Domiciano

Re: Vacuum Problems

From
Rafael Martinez
Date:
Rafael Domiciano wrote:

Hei

> I need some help or just some hints. I am having problems with vacuum
> full in one table only: "clifatura".
> That table has today around 7 million rows.
>

How big is the database?

> I scheduled on the server a cron job to run VACUUM FULL every day at 23
> P.M, but on the following day, at 8 A.M. vacuum is yet working on
> "clifatura"; so I have to kill vacuum.
>

Do you have any locks in this table/database that prevent the vacuum to
be started?

Why do you use VACUUM FULL everyday? This will lock your database until
it finish. I can count with one hand the times we had to run vacuum full
in our systems in the last 6 years.

Use a normal vacuum analyze if 'autovacuum' is not enough for you.

--
 Rafael Martinez, <r.m.guerrero@usit.uio.no>
 Center for Information Technology Services
 University of Oslo, Norway

 PGP Public Key: http://folk.uio.no/rafael/

Re: Vacuum Problems

From
"Rafael Domiciano"
Date:
The database has around 40 Gb.

If I not use vacuum full everyday the database get very slow.

There is no deadlock on the database.
The vacuum does a clean in the table and in every index of the table "clifatura". And in the end of the vacuum, seems that vacuum is working hard on the table (Vacuum is using CPU and memory), but doesn't end.

INFO:  limpando "public.clifatura"
INFO:  "clifatura": encontrados 63 versões de registros removíveis e 6906667 não-removíveis em 238511 páginas
DETAIL:  0 versões de registros não vigentes não podem ser removidas ainda.
Versões de registros não-removíveis estão no intervalo de 168 a 428 bytes.
Havia 262348 ponteiros de itens não utilizados.
Espaço livre total (incluindo versões de registros removíveis) é 99117652 bytes.
2351 páginas estão ou estarão vazias, incluindo 0 no fim da tabela.
95651 páginas contendo 87437816 bytes livres são destinos potenciais.
CPU 2.86s/1.08u sec elapsed 92.21 sec.
INFO:  índice "caeindclf_01" agora contém 6906667 versões de registros em 39824 páginas
DETAIL:  63 versões de registros de índices foram apagadas.
0 páginas de índice foram removidas, 0 são reutilizáveis.
CPU 0.58s/3.18u sec elapsed 26.82 sec.
INFO:  índice "caeindclf_03" agora contém 6906667 versões de registros em 43174 páginas
DETAIL:  63 versões de registros de índices foram apagadas.
0 páginas de índice foram removidas, 0 são reutilizáveis.
CPU 0.64s/3.18u sec elapsed 23.62 sec.
INFO:  índice "caeindclf_04" agora contém 6906667 versões de registros em 36346 páginas
DETAIL:  63 versões de registros de índices foram apagadas.
0 páginas de índice foram removidas, 0 são reutilizáveis.
CPU 0.50s/2.63u sec elapsed 19.97 sec.
INFO:  índice "caeindclf_05" agora contém 6906667 versões de registros em 35965 páginas
DETAIL:  63 versões de registros de índices foram apagadas.
0 páginas de índice foram removidas, 0 são reutilizáveis.
CPU 0.56s/3.06u sec elapsed 12.40 sec.
INFO:  índice "caeindclf_06" agora contém 6906667 versões de registros em 32085 páginas
DETAIL:  63 versões de registros de índices foram apagadas.
0 páginas de índice foram removidas, 0 são reutilizáveis.
CPU 0.49s/3.08u sec elapsed 10.68 sec.
INFO:  índice "caeindclf_11" agora contém 6906667 versões de registros em 28596 páginas
DETAIL:  63 versões de registros de índices foram apagadas.
867 páginas de índice foram removidas, 867 são reutilizáveis.
CPU 0.39s/2.11u sec elapsed 9.40 sec.
INFO:  índice "caeindclf_12" agora contém 6906667 versões de registros em 46721 páginas
DETAIL:  63 versões de registros de índices foram apagadas.
287 páginas de índice foram removidas, 287 são reutilizáveis.
CPU 0.57s/2.59u sec elapsed 21.90 sec.
INFO:  índice "caeindclf_13" agora contém 6906667 versões de registros em 28658 páginas
DETAIL:  63 versões de registros de índices foram apagadas.
336 páginas de índice foram removidas, 336 são reutilizáveis.
CPU 0.33s/2.19u sec elapsed 17.31 sec.
INFO:  índice "caeindclf_16" agora contém 6906667 versões de registros em 16565 páginas
DETAIL:  63 versões de registros de índices foram apagadas.
399 páginas de índice foram removidas, 399 são reutilizáveis.
CPU 0.18s/1.88u sec elapsed 13.29 sec.
INFO:  índice "caeindclf_17" agora contém 6906667 versões de registros em 28676 páginas
DETAIL:  63 versões de registros de índices foram apagadas.
1433 páginas de índice foram removidas, 1433 são reutilizáveis.
CPU 0.26s/2.00u sec elapsed 11.38 sec.
INFO:  índice "caeindclf_14" agora contém 6906667 versões de registros em 34523 páginas
DETAIL:  63 versões de registros de índices foram apagadas.
256 páginas de índice foram removidas, 256 são reutilizáveis.
CPU 0.34s/2.31u sec elapsed 14.83 sec.
INFO:  índice "caeindclf_15" agora contém 6906667 versões de registros em 31569 páginas
DETAIL:  63 versões de registros de índices foram apagadas.
109 páginas de índice foram removidas, 109 são reutilizáveis.
CPU 0.34s/2.43u sec elapsed 14.05 sec.
INFO:  índice "caeindclf_18" agora contém 6906667 versões de registros em 31570 páginas
DETAIL:  63 versões de registros de índices foram apagadas.
451 páginas de índice foram removidas, 451 são reutilizáveis.
CPU 0.31s/2.41u sec elapsed 17.53 sec.
INFO:  índice "clifatura_pkey" agora contém 6906667 versões de registros em 15833 páginas
DETAIL:  63 versões de registros de índices foram apagadas.
0 páginas de índice foram removidas, 0 são reutilizáveis.
CPU 0.17s/2.12u sec elapsed 11.04 sec.
INFO:  índice "caeindclf_10" agora contém 6906667 versões de registros em 20537 páginas
DETAIL:  63 versões de registros de índices foram apagadas.
1035 páginas de índice foram removidas, 1035 são reutilizáveis.
CPU 0.27s/1.97u sec elapsed 13.14 sec.


2008/11/26 Rafael Martinez <r.m.guerrero@usit.uio.no>
Rafael Domiciano wrote:

Hei

> I need some help or just some hints. I am having problems with vacuum
> full in one table only: "clifatura".
> That table has today around 7 million rows.
>

How big is the database?

> I scheduled on the server a cron job to run VACUUM FULL every day at 23
> P.M, but on the following day, at 8 A.M. vacuum is yet working on
> "clifatura"; so I have to kill vacuum.
>

Do you have any locks in this table/database that prevent the vacuum to
be started?

Why do you use VACUUM FULL everyday? This will lock your database until
it finish. I can count with one hand the times we had to run vacuum full
in our systems in the last 6 years.

Use a normal vacuum analyze if 'autovacuum' is not enough for you.

--
 Rafael Martinez, <r.m.guerrero@usit.uio.no>
 Center for Information Technology Services
 University of Oslo, Norway

 PGP Public Key: http://folk.uio.no/rafael/

Re: Vacuum Problems

From
"Scott Marlowe"
Date:
2008/11/26 Rafael Domiciano <rafael.domiciano@gmail.com>:
> The database has around 40 Gb.
> If I not use vacuum full everyday the database get very slow.
>
> There is no deadlock on the database.

You didn't mention  if you were using autovacuum or not.  You also
didn't mention whether or not you'd tried regular vacuums if for some
reason you can't use autovacuum.  Vacuum full is very intrusive and
can cause index bloat.  It's more of a repair operation than a routine
maintenance one.

What do the last 10 or so lines of vacuum verbose run as a super user say?

Re: Vacuum Problems

From
"Rafael Domiciano"
Date:
I'm not using autovacuum. Regular vacuum goes ok.
To see the last 10 lines of verbose i will need to run vacuum tonight

If a run a reindex before the vacuum full, increase the "speed" of doing vacuum? I found something about it googling.

2008/11/26 Scott Marlowe <scott.marlowe@gmail.com>
2008/11/26 Rafael Domiciano <rafael.domiciano@gmail.com>:
> The database has around 40 Gb.
> If I not use vacuum full everyday the database get very slow.
>
> There is no deadlock on the database.

You didn't mention  if you were using autovacuum or not.  You also
didn't mention whether or not you'd tried regular vacuums if for some
reason you can't use autovacuum.  Vacuum full is very intrusive and
can cause index bloat.  It's more of a repair operation than a routine
maintenance one.

What do the last 10 or so lines of vacuum verbose run as a super user say?

Re: Vacuum Problems

From
"Scott Marlowe"
Date:
On Wed, Nov 26, 2008 at 10:21 AM, Rafael Domiciano
<rafael.domiciano@gmail.com> wrote:
> I'm not using autovacuum. Regular vacuum goes ok.
> To see the last 10 lines of verbose i will need to run vacuum tonight
> If a run a reindex before the vacuum full, increase the "speed" of doing
> vacuum? I found something about it googling.

You should look into enabling autovac.  You can set the sleep
parameter to 10 or 20 to keep it from hogging your I/O bandwidth.
Also, reindex AFTER the vacuum full, not before.

Re: Vacuum Problems

From
Jan-Peter Seifert
Date:
Rafael Domiciano wrote:

> The database has around 40 Gb.
>
> If I not use vacuum full everyday the database get very slow.
>
> There is no deadlock on the database.
> The vacuum does a clean in the table and in every index of the table
> "clifatura". And in the end of the vacuum, seems that vacuum is working
> hard on the table (Vacuum is using CPU and memory), but doesn't end.

Well - maybe an index on the table is broken? I did once run a script on
a database to insert a rather big amount of data. During reindexing the
db ran out of diskspace, but obviously it was freed after that so I
missed the really important error message. I just wondered why the
server had restarted in the meantime. A vacuum on the table in question
took forever. I'm not sure whether I just did a reindex or a dump and
restore to fix this. Just a suggestion. Good luck.

Peter



Re: Vacuum Problems

From
"Matthew T. O'Connor"
Date:
Rafael Domiciano wrote:
> I'm not using autovacuum. Regular vacuum goes ok.
> To see the last 10 lines of verbose i will need to run vacuum tonight
>
> If a run a reindex before the vacuum full, increase the "speed" of
> doing vacuum? I found something about it googling.

It might help a bit, but by the end of VACUUM FULL you would need to run
reindex again as VACUUM FULL tends to cause a lot of index bloat.   It
is normal for tables to have some slack space, so if you do a regular
vacuum every day (or let autovacuum) it's normal for the table to be a
bit bigger than after a VACUUM FULL, but they should ready steady state
and stop growing.

You may also want to look into using CLUSTER it will rewrite the whole
table and is typically much more efficient that VACUUM FULL, but it
requires 2x disk space while running.

Matt


Re: Vacuum Problems

From
"Scott Marlowe"
Date:
On Wed, Nov 26, 2008 at 12:54 PM, Matthew T. O'Connor <matthew@zeut.net> wrote:
> Rafael Domiciano wrote:
>>
>> I'm not using autovacuum. Regular vacuum goes ok.
>> To see the last 10 lines of verbose i will need to run vacuum tonight
>>
>> If a run a reindex before the vacuum full, increase the "speed" of doing
>> vacuum? I found something about it googling.
>
> It might help a bit, but by the end of VACUUM FULL you would need to run
> reindex again as VACUUM FULL tends to cause a lot of index bloat.   It is
> normal for tables to have some slack space, so if you do a regular vacuum
> every day (or let autovacuum) it's normal for the table to be a bit bigger
> than after a VACUUM FULL, but they should ready steady state and stop
> growing.

But there are certain use cases that would be classified as
pathological in nature, that you can't handle with regular vacuum.
It's only when you've proven that that's your case, and you can't
program around it, that you should start using vacuum full though.
vacuum full is so expensive in terms of time the system is largely
unusable combined with the need to run reindex or replace the whole
thing with cluster, that if regular or autovacuum can handle the load,
then that's what you do.

I've only seen cases where things like large imports were using a
shared table where it would get bloated insanely if three or four
imports were running at the same time with the occasional update with
no where clause.  Since you can't truncate the table, because it's
shared with other imports, you have to vacuum it, but if you bloat it
by 10x or 100x normal size in 30 seconds, no amount of regular
vacuuming will help.

So, users need to understand why they're always asked if they're
running autovacuum or not.  It's like asking someone with a
nonfunctioning dryer if they've cleaned the lint trap.  It's just
something we assume someone should try first unless there's a good
reason not to.  Because it does work so well most of the time.  I run
autovacuum.  I also email myself the output from vacuum verbose every
week, to look through and see how the tables are looking.  A quick
look near the end tells you if you're mostly ok, and quick
investigation can find bloated tables pretty fast.

So, for the OP, have you tried autovacuum, and why aren't you using
it.  Most of the time people aren't running it it's for erroneous
reasons.

Also, look into updating to 8.3 or above.  With its HOT update
mechanism, and autovacuum enabled by default it handles these
situations quite easily.

NUMA architecture and PostgreSQL

From
Fabricio
Date:
 
Hi...
 
I have an IBM server with NUMA architecture, two nodes with 4 CPU quad core and 64 GB of RAM each, and PostgrSQL 8.3.5.

Is there any way to avoid the performance degradation when the load goes up and used the two nodes?

I understand this performance degradation is due to the higher latency than a cpu access a remote memory that the  local memory. But there are ways to improve this?
 
http://archives.postgresql.org/pgsql-general/2005-04/msg01187.php
 
 
Any suggestions?

    Greetings... 
            Thanks...

 


¿Tus fotos son un desorden? La solución a tus males se llama Galería fotográfica de Windows Live

Re: NUMA architecture and PostgreSQL

From
Tino Schwarze
Date:
On Wed, Nov 26, 2008 at 06:13:41PM -0700, Fabricio wrote:

> I have an IBM server with NUMA architecture, two nodes with 4 CPU quad
> core and 64 GB of RAM each, and PostgrSQL 8.3.5. Is there any way to
> avoid the performance degradation when the load goes up and used the
> two nodes? I understand this performance degradation is due to the
> higher latency than a cpu access a remote memory that the  local
> memory. But there are ways to improve this?
>
> http://archives.postgresql.org/pgsql-general/2005-04/msg01187.php
>
> Any suggestions?     Greetings...             Thanks...

I suppose this has nothing to do with PostgreSQL but needs some
operating system level tuning. Maybe there is a way to bind the postgres
processes to one node's CPUs... What operating system are you using?
I know that there are ways to bind processes to a CPU on Linux, I'm not
sure how to assign multiple processes to multiple CPUs. The Linux
scheduler should know about NUMA architectures though, there not that
new any more.

Tino.

--
"What we nourish flourishes." - "Was wir nähren erblüht."

www.lichtkreis-chemnitz.de
www.craniosacralzentrum.de

Re: NUMA architecture and PostgreSQL

From
"Marko Kreen"
Date:
On 11/27/08, Fabricio <fabrixio1@hotmail.com> wrote:
>  I have an IBM server with NUMA architecture, two nodes with 4 CPU quad core and 64 GB of RAM each, and PostgrSQL
8.3.5.
>
> Is there any way to avoid the performance degradation when the load goes up and used the two nodes?
>
> I understand this performance degradation is due to the higher latency than a cpu access a remote memory that the
localmemory. But there are ways to improve this? 
>
>  http://archives.postgresql.org/pgsql-general/2005-04/msg01187.php


You can use sched_setaffinity to select group of cpu-s for a process
and it's childs:

  http://www.kernel.org/doc/man-pages/online/pages/man2/sched_setaffinity.2.html

There is a bit higher level concept of cpusets too:

  http://www.kernel.org/doc/man-pages/online/pages/man7/cpuset.7.html

In both cases, you can either patch Postgres or write a launcher
that configures CPUs before executing postgres.

--
marko

Re: NUMA architecture and PostgreSQL

From
Fabricio
Date:
 
> I suppose this has nothing to do with PostgreSQL but needs some
> operating system level tuning. Maybe there is a way to bind the postgres
> processes to one node's CPUs... What operating system are you using?
 
I am using Red Hat Enterprise 5.2, kernel  2.6.18-92.el5 x86_64

> I know that there are ways to bind processes to a CPU on Linux, I'm not
> sure how to assign multiple processes to multiple CPUs. The Linux
> scheduler should know about NUMA architectures though, there not that
> new any more.
 
something like this?
numactl --membind=1 su - postgres -c "/usr/local/pgsql/bin/pg_ctl start -D /usr/local/pgsql/data/"
 this bind memory to node 1 to use the shared memory on node 1 and cpus preferred node 1. With this the other processes and S.O by default will use the node 0?
 
thanks...


¿Tienes cámara digital, más de una cuenta de email, planeas fiestas o blogueas? Entonces, necesitas Windows Live

Re: NUMA architecture and PostgreSQL

From
Fabricio
Date:

Hi Marko, thanks for your help...
 
> There is a bit higher level concept of cpusets too:
>
> http://www.kernel.org/doc/man-pages/online/pages/man7/cpuset.7.html
>

> In both cases, you can either patch Postgres or write a launcher
> that configures CPUs before executing postgres.

This paragraph is the liks of cpuset:
       On systems with kernels compiled with built in support for cpusets, all
processes are attached to a cpuset, and cpusets are always present. If a
system supports cpusets, then it will have the entry nodev cpuset in the file
/proc/filesystems. By mounting the cpuset file system (see the EXAMPLE
section below), the administrator can configure the cpusets on a system to
control the processor and memory placement of processes on that system. By
default, if the cpuset configuration on a system is not modified or if the
cpuset file system is not even mounted, then the cpuset mechanism, though
present, has no affect on the system's behavior.

Do I need to mount the cpuset file system for use numactl?

Example:
numactl --membind=1 su - postgres -c "/usr/local/pgsql/bin/pg_ctl start -D /usr/local/pgsql/data/"

The two liks are about programing, but can I use only numactl with postgres?

thanks  marko...



¿Tus fotos son un desorden? La solución a tus males se llama Galería fotográfica de Windows Live

Re: NUMA architecture and PostgreSQL

From
lst_hoe02@kwsoft.de
Date:
Zitat von Fabricio <fabrixio1@hotmail.com>:

>
>
> Hi...
>
> I have an IBM server with NUMA architecture, two nodes with 4 CPU
> quad core and 64 GB of RAM each, and PostgrSQL 8.3.5. Is there any
> way to avoid the performance degradation when the load goes up and
> used the two nodes? I understand this performance degradation is due
> to the higher latency than a cpu access a remote memory that the
> local memory. But there are ways to improve this?
>
> http://archives.postgresql.org/pgsql-general/2005-04/msg01187.php

As far as i can see PostgreSQL is not able to support NUMA? With a
little search i get to rellay interesting links according to the
problem and a potential solution supported by Oracle (using huge
pages). Unfortunately PostgreSQL seams not using "huge pages" until
now??

http://oss.linbit.com/hugetlb/
http://kevinclosson.wordpress.com/kevin-closson-index/oracle-on-opteron-k8l-numa-etc/

Any comments on this from PostgreSQL developers?

Regards

Andreas


Re: Vacuum Problems

From
"Rafael Domiciano"
Date:
Hello guys,

I tried to modify my vacuum routine, and started to only run vacuum verbose analyze diary followed by a reindex weekly.
But I still having problems in my database. The uptime database is hard to stay below 10.
I'm thinking that my hardware is not more good as it was sometime ago.

The machine is a: 2 x Intel Xeon Dual-Core 2.3 GHz, 2 Gb RAM. The load on machine is about at 10000 transactions / m

Maybe I need more RAM memory?

2008/11/26 Scott Marlowe <scott.marlowe@gmail.com>
On Wed, Nov 26, 2008 at 12:54 PM, Matthew T. O'Connor <matthew@zeut.net> wrote:
> Rafael Domiciano wrote:
>>
>> I'm not using autovacuum. Regular vacuum goes ok.
>> To see the last 10 lines of verbose i will need to run vacuum tonight
>>
>> If a run a reindex before the vacuum full, increase the "speed" of doing
>> vacuum? I found something about it googling.
>
> It might help a bit, but by the end of VACUUM FULL you would need to run
> reindex again as VACUUM FULL tends to cause a lot of index bloat.   It is
> normal for tables to have some slack space, so if you do a regular vacuum
> every day (or let autovacuum) it's normal for the table to be a bit bigger
> than after a VACUUM FULL, but they should ready steady state and stop
> growing.

But there are certain use cases that would be classified as
pathological in nature, that you can't handle with regular vacuum.
It's only when you've proven that that's your case, and you can't
program around it, that you should start using vacuum full though.
vacuum full is so expensive in terms of time the system is largely
unusable combined with the need to run reindex or replace the whole
thing with cluster, that if regular or autovacuum can handle the load,
then that's what you do.

I've only seen cases where things like large imports were using a
shared table where it would get bloated insanely if three or four
imports were running at the same time with the occasional update with
no where clause.  Since you can't truncate the table, because it's
shared with other imports, you have to vacuum it, but if you bloat it
by 10x or 100x normal size in 30 seconds, no amount of regular
vacuuming will help.

So, users need to understand why they're always asked if they're
running autovacuum or not.  It's like asking someone with a
nonfunctioning dryer if they've cleaned the lint trap.  It's just
something we assume someone should try first unless there's a good
reason not to.  Because it does work so well most of the time.  I run
autovacuum.  I also email myself the output from vacuum verbose every
week, to look through and see how the tables are looking.  A quick
look near the end tells you if you're mostly ok, and quick
investigation can find bloated tables pretty fast.

So, for the OP, have you tried autovacuum, and why aren't you using
it.  Most of the time people aren't running it it's for erroneous
reasons.

Also, look into updating to 8.3 or above.  With its HOT update
mechanism, and autovacuum enabled by default it handles these
situations quite easily.

Re: Vacuum Problems

From
"Scott Marlowe"
Date:
On Mon, Dec 8, 2008 at 6:04 AM, Rafael Domiciano
<rafael.domiciano@gmail.com> wrote:
> Hello guys,
> I tried to modify my vacuum routine, and started to only run vacuum verbose
> analyze diary followed by a reindex weekly.
> But I still having problems in my database. The uptime database is hard to
> stay below 10.
> I'm thinking that my hardware is not more good as it was sometime ago.
> The machine is a: 2 x Intel Xeon Dual-Core 2.3 GHz, 2 Gb RAM. The load on
> machine is about at 10000 transactions / m
> Maybe I need more RAM memory?

Likely you need more hard drives and / or a quality caching RAID
controller.  What's your I/O subsystem look like now?

Re: Vacuum Problems

From
"Scott Marlowe"
Date:
On Mon, Dec 8, 2008 at 6:04 AM, Rafael Domiciano
<rafael.domiciano@gmail.com> wrote:
> Hello guys,
> I tried to modify my vacuum routine, and started to only run vacuum verbose
> analyze diary followed by a reindex weekly.

Have you tried running autovacuum with a naptime of 10 or 20?

Re: Vacuum Problems

From
"Rafael Domiciano"
Date:
How I do to see my I/O subsystem? vmstat? If so, follow:
[root@postgres ~]# vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa st
 0 13 127728  58020  19072 1332076    3    2     3     1    1     7 11  3 52 33  0
 1 13 127728  56488  19156 1332456    0    0   644   168 2534  1528  5  3 48 44  0
 2 10 127728  55644  19216 1333136    0    0   700   640 1989  1723  5  5 46 44  0
 0 12 127728  53252  19268 1334640    0    0   864   796 2351  1833  5  4 47 43  0
 0 13 127728  53016  19296 1334208    0    0  2040   452 2277  2126  6  5 40 49  0
 0 11 127728  52944  19224 1331508    0    0  1208   484 2306  1421  5  3 21 71  0
 1 15 127728  56628  19016 1331012    0    0  1620   944 2410  2499  7  5 24 64  0
 1 12 127728  55360  19032 1332500    0    0  1040   592 2423  2074  6  5  7 82  0
 2  9 127728  54592  19068 1333344    4    0  1392   604 2695  2431  8  8 22 63  0
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa st
 4  8 127728  54632  19084 1334676    0    0   892  1164 2530  2147  5  4 53 38  0
 0 10 127728  53256  19064 1333148    0    0   920   208 3119  2765 11  5 15 68  0
 0 10 127728  54296  18856 1331404    0    0   756   572 2419  1412  7  3 40 50  0
 1  9 127728  57544  18872 1332012    0    0   816   800 2577  2390  8  7 27 59  0

The Hard Drive is a SATA-II 150 Gb, dedicate to postgres service.

2008/12/8 Scott Marlowe <scott.marlowe@gmail.com>
On Mon, Dec 8, 2008 at 6:04 AM, Rafael Domiciano
<rafael.domiciano@gmail.com> wrote:
> Hello guys,
> I tried to modify my vacuum routine, and started to only run vacuum verbose
> analyze diary followed by a reindex weekly.
> But I still having problems in my database. The uptime database is hard to
> stay below 10.
> I'm thinking that my hardware is not more good as it was sometime ago.
> The machine is a: 2 x Intel Xeon Dual-Core 2.3 GHz, 2 Gb RAM. The load on
> machine is about at 10000 transactions / m
> Maybe I need more RAM memory?

Likely you need more hard drives and / or a quality caching RAID
controller.  What's your I/O subsystem look like now?

Re: Vacuum Problems

From
"Rafael Domiciano"
Date:
No, I'm gonna to try

2008/12/8 Scott Marlowe <scott.marlowe@gmail.com>
On Mon, Dec 8, 2008 at 6:04 AM, Rafael Domiciano
<rafael.domiciano@gmail.com> wrote:
> Hello guys,
> I tried to modify my vacuum routine, and started to only run vacuum verbose
> analyze diary followed by a reindex weekly.

Have you tried running autovacuum with a naptime of 10 or 20?

Re: Vacuum Problems

From
"Scott Marlowe"
Date:
On Mon, Dec 8, 2008 at 9:11 AM, Rafael Domiciano
<rafael.domiciano@gmail.com> wrote:
> How I do to see my I/O subsystem? vmstat? If so, follow:

No, I mean, how many drives do you have, what kind of RAID controller,
if any, how they're configured, and so on.  :)  Sorry wasn't really
clear there was I?

The vmstat numbers are horrible by the way, assuming they're in 1k
blocks, you're reading and writing at < 1 Meg a second.  It almost
looks like a degraded RAID-5 array.

Re: Vacuum Problems

From
"Rafael Domiciano"
Date:
1 Drive Only. This server has no RAID.
Do you think that the I/O is very high and I'm needing a RAID?!

2008/12/8 Scott Marlowe <scott.marlowe@gmail.com>
On Mon, Dec 8, 2008 at 9:11 AM, Rafael Domiciano
> How I do to see my I/O subsystem? vmstat? If so, follow:

No, I mean, how many drives do you have, what kind of RAID controller,
if any, how they're configured, and so on.  :)  Sorry wasn't really
clear there was I?

The vmstat numbers are horrible by the way, assuming they're in 1k
blocks, you're reading and writing at < 1 Meg a second.  It almost
looks like a degraded RAID-5 array.

Re: Vacuum Problems

From
"Scott Marlowe"
Date:
On Mon, Dec 8, 2008 at 9:29 AM, Rafael Domiciano
<rafael.domiciano@gmail.com> wrote:
> 1 Drive Only. This server has no RAID.
> Do you think that the I/O is very high and I'm needing a RAID?!

Not necessarily.  Like I said, my laptop currently is about 25 to 30
times faster writing to disk than your server.  So, I think something
is wrong.  Try doing this, run vmstat 1, and while that's running, in
another window, do something like:

time dd if=/dev/zero of=/mnt/myslowdrive/testfile bs=1000000 count=500
time dd of=/dev/null if=/mnt/myslowdrive/testfile

and see what vmstat says while that's running, and how long it takes.
I'm guessing something is very wrong and you'll get really low
numbers.

Re: Vacuum Problems

From
"Scott Marlowe"
Date:
On Mon, Dec 8, 2008 at 10:17 AM, Rafael Domiciano
<rafael.domiciano@gmail.com> wrote:
> Here's the output. While the process was running my database get sometime
> without doing anything.
> You said that I probably get low numbers, but what numbers?

We're looking for MB/s and the bi/bo fields in bmstat (blocks in / blocks out)

> [root@postgres banco]# time dd if=/dev/zero of=/banco/testfile bs=1000000
> count=500
> 500+0 records in
> 500+0 records out
> 500000000 bytes (500 MB) copied, 85,4485 seconds, 5,9 MB/s
> real    1m25.451s
> user    0m0.003s
> sys     0m1.617s
> [root@postgres banco]# time dd of=/dev/null if=/banco/testfile
> 976562+1 records in
> 976562+1 records out
> 500000000 bytes (500 MB) copied, 47,5025 seconds, 10,5 MB/s
> real    0m47.543s
> user    0m0.457s
> sys     0m1.470s
> And the vmstat output:
>
> procs -----------memory---------- ---swap-- -----io---- --system--
> -----cpu------
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
> wa st
>  0  8 111760  56432  19812 1305692    0    0   744   832 2488  1518  5  3 46
> 46  0

Here, the bi/bo numbers are 768 in and 832 out.  That's k/second we're
looking at.

Lots of the same numbers cut out.

> procs -----------memory---------- ---swap-- -----io---- --system--
> -----cpu------
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
> wa st
>  0  6 111760  55452  18528 1296412    0    0    88 25040 1843  1436  2  3 34
> 61  0
>  1  7 111760  52352  18540 1296472    0    0    36 15220 1467   944  3  2 27
> 68  0
>  0  6 111760  54152  18552 1296568    0    0    88 42296 1609   822  1  2 38
> 59  0
>  1  8 111760  52412  18580 1296752    0    0   172 27052 1567   950  2  3 27
> 69  0
>  0  6 111724  53904  18632 1297008  320    0   580  1504 1902  2320  5  6 23
> 66  0
>  1  6 111724  54280  18660 1297200    4    0   200  1060 2014  1783  5  3 44
> 48  0
>  1  7 111676  51388  16340 1302348    8    0   156  1212 1684   848  2  2 53
> 42  0
>  1  6 111668  55040  14864 1301048    0    4   152 46328 1595  5108  1  5 40
> 54  0

Now we're showing that we can write to the disk at 25 to 42 Megs a
second, not too bad.  But it looks really bursty, like it can sustain
this throughput for only a few seconds.   Try writing a larger file
and run vmstat 10 or 60 as well and see what the average over a longer
time with a larger file is.  I get a feeling your machine has a hard
time sustaining throughput for some reason.

Re: Vacuum Problems

From
"Rafael Domiciano"
Date:
Maybe Could be Check-Points? I have changed the parameters sometime ago. Follow:

effective_cache_size = 200MB

checkpoint_segments = 40                # in logfile segments, min 1, 16MB each
checkpoint_timeout = 3min               # range 30s-1h

#bgwriter_delay = 200ms                 # 10-10000ms between rounds
bgwriter_lru_percent = 16.0             # 0-100% of LRU buffers scanned/round
bgwriter_lru_maxpages = 170             # 0-1000 buffers max written/round
bgwriter_all_percent = 7.0              # 0-100% of all buffers scanned/round
bgwriter_all_maxpages = 400             # 0-1000 buffers max written/round

Now I'm having the following problem on vacuum:
vacuumdb: vacuuming of database "Postgres" failed: ERRO:  multiple active vacuums for index "PartInd_replicacao_historico_sl_log_2-node-1".

But there's no anymore vacuum running.

2008/12/8 Scott Marlowe <scott.marlowe@gmail.com>
On Mon, Dec 8, 2008 at 10:17 AM, Rafael Domiciano
> Here's the output. While the process was running my database get sometime
> without doing anything.
> You said that I probably get low numbers, but what numbers?

We're looking for MB/s and the bi/bo fields in bmstat (blocks in / blocks out)

> [root@postgres banco]# time dd if=/dev/zero of=/banco/testfile bs=1000000
> count=500
> 500+0 records in
> 500+0 records out
> 500000000 bytes (500 MB) copied, 85,4485 seconds, 5,9 MB/s
> real    1m25.451s
> user    0m0.003s
> sys     0m1.617s
> [root@postgres banco]# time dd of=/dev/null if=/banco/testfile
> 976562+1 records in
> 976562+1 records out
> 500000000 bytes (500 MB) copied, 47,5025 seconds, 10,5 MB/s
> real    0m47.543s
> user    0m0.457s
> sys     0m1.470s
> And the vmstat output:
>
> procs -----------memory---------- ---swap-- -----io---- --system--
> -----cpu------
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
> wa st
>  0  8 111760  56432  19812 1305692    0    0   744   832 2488  1518  5  3 46
> 46  0

Here, the bi/bo numbers are 768 in and 832 out.  That's k/second we're
looking at.

Lots of the same numbers cut out.

> procs -----------memory---------- ---swap-- -----io---- --system--
> -----cpu------
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
> wa st
>  0  6 111760  55452  18528 1296412    0    0    88 25040 1843  1436  2  3 34
> 61  0
>  1  7 111760  52352  18540 1296472    0    0    36 15220 1467   944  3  2 27
> 68  0
>  0  6 111760  54152  18552 1296568    0    0    88 42296 1609   822  1  2 38
> 59  0
>  1  8 111760  52412  18580 1296752    0    0   172 27052 1567   950  2  3 27
> 69  0
>  0  6 111724  53904  18632 1297008  320    0   580  1504 1902  2320  5  6 23
> 66  0
>  1  6 111724  54280  18660 1297200    4    0   200  1060 2014  1783  5  3 44
> 48  0
>  1  7 111676  51388  16340 1302348    8    0   156  1212 1684   848  2  2 53
> 42  0
>  1  6 111668  55040  14864 1301048    0    4   152 46328 1595  5108  1  5 40
> 54  0

Now we're showing that we can write to the disk at 25 to 42 Megs a
second, not too bad.  But it looks really bursty, like it can sustain
this throughput for only a few seconds.   Try writing a larger file
and run vmstat 10 or 60 as well and see what the average over a longer
time with a larger file is.  I get a feeling your machine has a hard
time sustaining throughput for some reason.

Re: Vacuum Problems

From
"Rafael Domiciano"
Date:
I think that I resolved I my problems with vacuum.
I started the autovacuum and daily runs vacuum verbose analyze, and weekly Reindex.
I'm monitoring the database now.

But, my database continues slow.
Googling a while I started to read something about pdflush and dirty_ratio.
Executing the command (while [ 1 ]; do cat /proc/meminfo | grep Dirty; sleep 1; done) I get:
Dirty:            8912 kB
Dirty:            5092 kB
Dirty:            5912 kB
Dirty:            6024 kB
Dirty:            6472 kB

With the time the Dirty Values increase and then my server does not do any query, or response, even "startup" stay on queue. So, Dirty values goes down and the server continues to proccess the queries and give response:
Dirty:           50024 kB
Dirty:           50684 kB
Dirty:           49888 kB
Dirty:           44884 kB
Dirty:           16888 kB
Dirty:           8884 kB

I set the dirty_ratio = 30 and the dirty_background_ratio = 1.

I don't know what more to do to solve this slow down problem.

Somebody have any ideia?

Thnks,

Rafael Domiciano

2008/12/9 Rafael Domiciano <rafael.domiciano@gmail.com>
Maybe Could be Check-Points? I have changed the parameters sometime ago. Follow:

effective_cache_size = 200MB

checkpoint_segments = 40                # in logfile segments, min 1, 16MB each
checkpoint_timeout = 3min               # range 30s-1h

#bgwriter_delay = 200ms                 # 10-10000ms between rounds
bgwriter_lru_percent = 16.0             # 0-100% of LRU buffers scanned/round
bgwriter_lru_maxpages = 170             # 0-1000 buffers max written/round
bgwriter_all_percent = 7.0              # 0-100% of all buffers scanned/round
bgwriter_all_maxpages = 400             # 0-1000 buffers max written/round

Now I'm having the following problem on vacuum:
vacuumdb: vacuuming of database "Postgres" failed: ERRO:  multiple active vacuums for index "PartInd_replicacao_historico_sl_log_2-node-1".

But there's no anymore vacuum running.

2008/12/8 Scott Marlowe <scott.marlowe@gmail.com>
On Mon, Dec 8, 2008 at 10:17 AM, Rafael Domiciano

> Here's the output. While the process was running my database get sometime
> without doing anything.
> You said that I probably get low numbers, but what numbers?

We're looking for MB/s and the bi/bo fields in bmstat (blocks in / blocks out)

> [root@postgres banco]# time dd if=/dev/zero of=/banco/testfile bs=1000000
> count=500
> 500+0 records in
> 500+0 records out
> 500000000 bytes (500 MB) copied, 85,4485 seconds, 5,9 MB/s
> real    1m25.451s
> user    0m0.003s
> sys     0m1.617s
> [root@postgres banco]# time dd of=/dev/null if=/banco/testfile
> 976562+1 records in
> 976562+1 records out
> 500000000 bytes (500 MB) copied, 47,5025 seconds, 10,5 MB/s
> real    0m47.543s
> user    0m0.457s
> sys     0m1.470s
> And the vmstat output:
>
> procs -----------memory---------- ---swap-- -----io---- --system--
> -----cpu------
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
> wa st
>  0  8 111760  56432  19812 1305692    0    0   744   832 2488  1518  5  3 46
> 46  0

Here, the bi/bo numbers are 768 in and 832 out.  That's k/second we're
looking at.

Lots of the same numbers cut out.

> procs -----------memory---------- ---swap-- -----io---- --system--
> -----cpu------
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
> wa st
>  0  6 111760  55452  18528 1296412    0    0    88 25040 1843  1436  2  3 34
> 61  0
>  1  7 111760  52352  18540 1296472    0    0    36 15220 1467   944  3  2 27
> 68  0
>  0  6 111760  54152  18552 1296568    0    0    88 42296 1609   822  1  2 38
> 59  0
>  1  8 111760  52412  18580 1296752    0    0   172 27052 1567   950  2  3 27
> 69  0
>  0  6 111724  53904  18632 1297008  320    0   580  1504 1902  2320  5  6 23
> 66  0
>  1  6 111724  54280  18660 1297200    4    0   200  1060 2014  1783  5  3 44
> 48  0
>  1  7 111676  51388  16340 1302348    8    0   156  1212 1684   848  2  2 53
> 42  0
>  1  6 111668  55040  14864 1301048    0    4   152 46328 1595  5108  1  5 40
> 54  0

Now we're showing that we can write to the disk at 25 to 42 Megs a
second, not too bad.  But it looks really bursty, like it can sustain
this throughput for only a few seconds.   Try writing a larger file
and run vmstat 10 or 60 as well and see what the average over a longer
time with a larger file is.  I get a feeling your machine has a hard
time sustaining throughput for some reason.


Re: Vacuum Problems

From
Alvaro Herrera
Date:
Rafael Domiciano escribió:

> 2008/12/9 Rafael Domiciano <rafael.domiciano@gmail.com>
>
> > Maybe Could be Check-Points? I have changed the parameters sometime ago.
> > Follow:
> > effective_cache_size = 200MB
> >
> > checkpoint_segments = 40                # in logfile segments, min 1, 16MB
> > each
> > checkpoint_timeout = 3min               # range 30s-1h

checkpoint_timeout 3 minutes?  I think you want to increase that a lot.

> > Now I'm having the following problem on vacuum:
> > vacuumdb: vacuuming of database "Postgres" failed: ERRO:  multiple active
> > vacuums for index "PartInd_replicacao_historico_sl_log_2-node-1".
> >
> > But there's no anymore vacuum running.

Something's very wrong here.  This shouldn't ever happen.  I haven't
followed this thread but you really should investigate this.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.