Thread: REINDEX takes half a day (and still not complete!)

REINDEX takes half a day (and still not complete!)

From
Phoenix Kiula
Date:
I have a large table but not as large as the kind of numbers that get
discussed on this list. It has 125 million rows.

REINDEXing the table takes half a day, and it's still not finished.

To write this post I did "SELECT COUNT(*)", and here's the output -- so long!

    select count(*) from links;
       count
    -----------
     125418191
    (1 row)

    Time: 1270405.373 ms

That's 1270 seconds!

I suppose the vaccuum analyze is not doing its job? As you can see
from settings below, I have autovacuum set to ON, and there's also a
cronjob every 10 hours to do a manual vacuum analyze on this table,
which is largest.

PG is version 8.2.9.

Any thoughts on what I can do to improve performance!?

Below are my settings.



max_connections              = 300
shared_buffers               = 500MB
effective_cache_size         = 1GB
max_fsm_relations            = 1500
max_fsm_pages                = 950000

work_mem                     = 100MB
temp_buffers                 = 4096
authentication_timeout       = 10s
ssl                          = off
checkpoint_warning           = 3600
random_page_cost             = 1

autovacuum                   = on
autovacuum_vacuum_cost_delay = 20

vacuum_cost_delay            = 20
vacuum_cost_limit            = 600

autovacuum_naptime           = 10
stats_start_collector        = on
stats_row_level              = on
autovacuum_vacuum_threshold  = 75
autovacuum_analyze_threshold = 25
autovacuum_analyze_scale_factor  = 0.02
autovacuum_vacuum_scale_factor   = 0.01

wal_buffers                  = 64
checkpoint_segments          = 128
checkpoint_timeout           = 900
fsync                        = on
maintenance_work_mem         = 512MB

Re: REINDEX takes half a day (and still not complete!)

From
Scott Marlowe
Date:
On Fri, Mar 18, 2011 at 9:07 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> I have a large table but not as large as the kind of numbers that get
> discussed on this list. It has 125 million rows.
>
> REINDEXing the table takes half a day, and it's still not finished.
>
> To write this post I did "SELECT COUNT(*)", and here's the output -- so long!
>
>    select count(*) from links;
>       count
>    -----------
>     125418191
>    (1 row)
>
>    Time: 1270405.373 ms
>
> That's 1270 seconds!
>
> I suppose the vaccuum analyze is not doing its job? As you can see
> from settings below, I have autovacuum set to ON, and there's also a
> cronjob every 10 hours to do a manual vacuum analyze on this table,
> which is largest.
>
> PG is version 8.2.9.
>
> Any thoughts on what I can do to improve performance!?
>
> Below are my settings.
>
>
>
> max_connections              = 300
> shared_buffers               = 500MB
> effective_cache_size         = 1GB
> max_fsm_relations            = 1500
> max_fsm_pages                = 950000
>
> work_mem                     = 100MB

What is the output of running vacuum verbose as a superuser (you can
run it on the postgres database so it returns fast.)  We're looking
for the output that looks like this:

INFO:  free space map contains 1930193 pages in 749 relations
DETAIL:  A total of 1787744 page slots are in use (including overhead).
1787744 page slots are required to track all free space.
Current limits are:  10000000 page slots, 3000 relations, using 58911 kB.

If the space needed exceeds page slots then you need to crank up your
free space map.  If the relations exceeds the available then you'll
need to crank up max relations.

Re: REINDEX takes half a day (and still not complete!)

From
Scott Marlowe
Date:
On Fri, Mar 18, 2011 at 9:07 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> autovacuum                   = on
> autovacuum_vacuum_cost_delay = 20
>
> vacuum_cost_delay            = 20
> vacuum_cost_limit            = 600
>
> autovacuum_naptime           = 10

also, if vacuum can't keep up you can increase the vacuum cost limit,
and lower the cost delay.  Anything above 1ms is still quite a wait
compared to 0.  And most systems don't have the real granularity to go
that low anyway, so 5ms is about as low as you can go and get a change
before 0.  Also, if you've got a lot of large relations you might need
to increase the max workers as well.

Re: REINDEX takes half a day (and still not complete!)

From
Phoenix Kiula
Date:
Thanks Scott.

> What is the output of running vacuum verbose as a superuser (you can
> run it on the postgres database so it returns fast.)


Here's the output for postgres DB:

    INFO:  free space map contains 110614 pages in 33 relations
    DETAIL:  A total of 110464 page slots are in use (including overhead).
    110464 page slots are required to track all free space.
    Current limits are:  950000 page slots, 1500 relations, using 5665 kB.
    VACUUM


Does running it on a postgres database also show the relevant info for
other databases?

From above it seems fine, right?



> also, if vacuum can't keep up you can increase the vacuum cost limit,
> and lower the cost delay.  Anything above 1ms is still quite a wait
> compared to 0.  And most systems don't have the real granularity to go
> that low anyway, so 5ms is about as low as you can go and get a change
> before 0.  Also, if you've got a lot of large relations you might need
> to increase the max workers as well.


I'm not sure I understand this.

(1) I should increase "max workers". But I am on version 8.2.9 -- did
this version have "autovacuum_max_workers"? It seems to be a more
recent thing: http://sn.im/27nxe1

(2) The big table in my database (with 125 million rows) has about
5,000 rows that get DELETEd every day, about 100,000 new INSERTs, and
about 12,000 UPDATEs.

(3) What's that thing about cost delay. Which values from vacuum
should I check to determine the cost delay -- what's the specific
formula?

Thanks!




On Sat, Mar 19, 2011 at 12:58 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Fri, Mar 18, 2011 at 9:07 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
>> I have a large table but not as large as the kind of numbers that get
>> discussed on this list. It has 125 million rows.
>>
>> REINDEXing the table takes half a day, and it's still not finished.
>>
>> To write this post I did "SELECT COUNT(*)", and here's the output -- so long!
>>
>>    select count(*) from links;
>>       count
>>    -----------
>>     125418191
>>    (1 row)
>>
>>    Time: 1270405.373 ms
>>
>> That's 1270 seconds!
>>
>> I suppose the vaccuum analyze is not doing its job? As you can see
>> from settings below, I have autovacuum set to ON, and there's also a
>> cronjob every 10 hours to do a manual vacuum analyze on this table,
>> which is largest.
>>
>> PG is version 8.2.9.
>>
>> Any thoughts on what I can do to improve performance!?
>>
>> Below are my settings.
>>
>>
>>
>> max_connections              = 300
>> shared_buffers               = 500MB
>> effective_cache_size         = 1GB
>> max_fsm_relations            = 1500
>> max_fsm_pages                = 950000
>>
>> work_mem                     = 100MB
>
> What is the output of running vacuum verbose as a superuser (you can
> run it on the postgres database so it returns fast.)  We're looking
> for the output that looks like this:
>
> INFO:  free space map contains 1930193 pages in 749 relations
> DETAIL:  A total of 1787744 page slots are in use (including overhead).
> 1787744 page slots are required to track all free space.
> Current limits are:  10000000 page slots, 3000 relations, using 58911 kB.
>
> If the space needed exceeds page slots then you need to crank up your
> free space map.  If the relations exceeds the available then you'll
> need to crank up max relations.
>

Re: REINDEX takes half a day (and still not complete!)

From
Merlin Moncure
Date:
On Fri, Mar 18, 2011 at 10:07 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> I have a large table but not as large as the kind of numbers that get
> discussed on this list. It has 125 million rows.
>
> REINDEXing the table takes half a day, and it's still not finished.
>
> To write this post I did "SELECT COUNT(*)", and here's the output -- so long!
>
>    select count(*) from links;
>       count
>    -----------
>     125418191
>    (1 row)
>
>    Time: 1270405.373 ms
>
> That's 1270 seconds!
>
> I suppose the vaccuum analyze is not doing its job? As you can see
> from settings below, I have autovacuum set to ON, and there's also a
> cronjob every 10 hours to do a manual vacuum analyze on this table,
> which is largest.
>
> PG is version 8.2.9.
>
> Any thoughts on what I can do to improve performance!?
>
> Below are my settings.
>
>
>
> max_connections              = 300
> shared_buffers               = 500MB
> effective_cache_size         = 1GB
> max_fsm_relations            = 1500
> max_fsm_pages                = 950000
>
> work_mem                     = 100MB
> temp_buffers                 = 4096
> authentication_timeout       = 10s
> ssl                          = off
> checkpoint_warning           = 3600
> random_page_cost             = 1
>
> autovacuum                   = on
> autovacuum_vacuum_cost_delay = 20
>
> vacuum_cost_delay            = 20
> vacuum_cost_limit            = 600
>
> autovacuum_naptime           = 10
> stats_start_collector        = on
> stats_row_level              = on
> autovacuum_vacuum_threshold  = 75
> autovacuum_analyze_threshold = 25
> autovacuum_analyze_scale_factor  = 0.02
> autovacuum_vacuum_scale_factor   = 0.01
>
> wal_buffers                  = 64
> checkpoint_segments          = 128
> checkpoint_timeout           = 900
> fsync                        = on
> maintenance_work_mem         = 512MB

how much memory do you have? you might want to consider raising
maintenance_work_mem to 1GB.  Are other things going on in the
database while you are rebuilding your indexes?  Is it possible you
are blocked waiting on a lock for a while?

How much index data is there?  Can we see the table definition along
with create index statements?

merlin

Re: REINDEX takes half a day (and still not complete!)

From
Scott Marlowe
Date:
On Mon, Mar 21, 2011 at 8:14 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> Thanks Merlin, Scott.
>
> First, yes, I can increase maintenance_work_memory. I have 8GB RAM in
> total, and sure, I can dedicate 1GB of it to PG. Currently PG is the
> most intensive software here.

If we're talking maintenance work mem, then you might want to set it
for a single connection.

set maintenance_work_mem='1000MB';
reindex yada yada;

etc.  So it's not global, just local.

> Second, how can I check if there are other things going on in the
> database while i REINDEX? Maybe some kind of vacuum is going on, but
> isn't that supposed to wait while REINDEX is happening for at least
> this table?

OK, my main point has been that if autovacuum is running well enough,
then you don't need reindex, and if you are running it it's a
maintenance thing you shouldn't have to schedule all the time, but
only run until you get autovac tuned up enough to handle your db
during the day.  however, I know sometimes you're stuck with what
you're stuck with.

You can see what else is running with the pg_stats_activity view,
which will show you all running queries.  That and iotop cna show you
which processes are chewing up how much IO.  The other pg_stat_*
tables can get you a good idea of what's happening to your tables in
the database.  iostat and vmstat can give you an idea how much IO
bandwidth you're using.

If a vacuum starts after the reindex it will either wait or abort and
not get in the way.  If a vacuum is already running I'm not sure if it
will get killed or not.

Re: REINDEX takes half a day (and still not complete!)

From
Phoenix Kiula
Date:
Sorry, rejuvenating a thread that was basically unanswered.

I closed the database for any kinds of access to focus on maintenance
operations, killed all earlier processes so that my maintenance is the
only stuff going on.

REINDEX is still taking 3 hours -- and it is still not finished!

Similarly, if I cancel the REINDEX and issue a VACUUM ANALYZE VERBOSE,
this too seems to just hang there on my big table.

I changed the maintenance_work_men to 2GB for this operation. It's
highly worrisome -- the above slow times are with 2GB of my server
dedicated to Postgresql!!!!

Surely this is not tenable for enterprise environments? I am on a
64bit RedHat server with dual CPU Intel Woodcrest or whatever that was
called. Postgres is 8.2.9.

How do DB folks do this with small maintenance windows? This is for a
very high traffic website so it's beginning to get embarrassing.

Would appreciate any thoughts or pointers.

Thanks!



On Mon, Mar 21, 2011 at 9:28 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Fri, Mar 18, 2011 at 10:07 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
>> I have a large table but not as large as the kind of numbers that get
>> discussed on this list. It has 125 million rows.
>>
>> REINDEXing the table takes half a day, and it's still not finished.
>>
>> To write this post I did "SELECT COUNT(*)", and here's the output -- so long!
>>
>>    select count(*) from links;
>>       count
>>    -----------
>>     125418191
>>    (1 row)
>>
>>    Time: 1270405.373 ms
>>
>> That's 1270 seconds!
>>
>> I suppose the vaccuum analyze is not doing its job? As you can see
>> from settings below, I have autovacuum set to ON, and there's also a
>> cronjob every 10 hours to do a manual vacuum analyze on this table,
>> which is largest.
>>
>> PG is version 8.2.9.
>>
>> Any thoughts on what I can do to improve performance!?
>>
>> Below are my settings.
>>
>>
>>
>> max_connections              = 300
>> shared_buffers               = 500MB
>> effective_cache_size         = 1GB
>> max_fsm_relations            = 1500
>> max_fsm_pages                = 950000
>>
>> work_mem                     = 100MB
>> temp_buffers                 = 4096
>> authentication_timeout       = 10s
>> ssl                          = off
>> checkpoint_warning           = 3600
>> random_page_cost             = 1
>>
>> autovacuum                   = on
>> autovacuum_vacuum_cost_delay = 20
>>
>> vacuum_cost_delay            = 20
>> vacuum_cost_limit            = 600
>>
>> autovacuum_naptime           = 10
>> stats_start_collector        = on
>> stats_row_level              = on
>> autovacuum_vacuum_threshold  = 75
>> autovacuum_analyze_threshold = 25
>> autovacuum_analyze_scale_factor  = 0.02
>> autovacuum_vacuum_scale_factor   = 0.01
>>
>> wal_buffers                  = 64
>> checkpoint_segments          = 128
>> checkpoint_timeout           = 900
>> fsync                        = on
>> maintenance_work_mem         = 512MB
>
> how much memory do you have? you might want to consider raising
> maintenance_work_mem to 1GB.  Are other things going on in the
> database while you are rebuilding your indexes?  Is it possible you
> are blocked waiting on a lock for a while?
>
> How much index data is there?  Can we see the table definition along
> with create index statements?
>
> merlin
>

Re: REINDEX takes half a day (and still not complete!)

From
Scott Marlowe
Date:
On Sun, Apr 17, 2011 at 9:30 AM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> Sorry, rejuvenating a thread that was basically unanswered.
>
> I closed the database for any kinds of access to focus on maintenance
> operations, killed all earlier processes so that my maintenance is the
> only stuff going on.
>
> REINDEX is still taking 3 hours -- and it is still not finished!
>
> Similarly, if I cancel the REINDEX and issue a VACUUM ANALYZE VERBOSE,
> this too seems to just hang there on my big table.
>
> I changed the maintenance_work_men to 2GB for this operation. It's
> highly worrisome -- the above slow times are with 2GB of my server
> dedicated to Postgresql!!!!
>
> Surely this is not tenable for enterprise environments? I am on a
> 64bit RedHat server with dual CPU Intel Woodcrest or whatever that was
> called. Postgres is 8.2.9.
>
> How do DB folks do this with small maintenance windows? This is for a
> very high traffic website so it's beginning to get embarrassing.
>
> Would appreciate any thoughts or pointers.

Upgrade to something more modern than 8.2.x.  Autovacuum was still
very much in its infancy back then.  9.0 or higher is a good choice.
What do iostat -xd 10 and vmstat 10 and top say about these processes
when they're running.  "It's taking a really long time and seems like
it's hanging" tells us nothing useful.  Your OS has tools to let you
figure out what's bottlenecking your operations, so get familiar with
them and let us know what they tell you.  These are all suggestions I
made before which you have now classified as "not answering your
questions" so I'm getting a little tired of helping you when you don't
seem interested in helping yourself.

What are your vacuum and autovacuum costing values set to?  Can you
make vacuum and / or autovacuum more aggresive?

Re: REINDEX takes half a day (and still not complete!)

From
Jesper Krogh
Date:


How do DB folks do this with small maintenance windows? This is for a
very high traffic website so it's beginning to get embarrassing.

Normally there is no need to issue reindex.  What's your reason for the need?

Jesper


Re: REINDEX takes half a day (and still not complete!)

From
Scott Marlowe
Date:
On Sun, Apr 17, 2011 at 9:44 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Sun, Apr 17, 2011 at 9:30 AM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
>> Sorry, rejuvenating a thread that was basically unanswered.
>>
>> I closed the database for any kinds of access to focus on maintenance
>> operations, killed all earlier processes so that my maintenance is the
>> only stuff going on.
>>
>> REINDEX is still taking 3 hours -- and it is still not finished!
>>
>> Similarly, if I cancel the REINDEX and issue a VACUUM ANALYZE VERBOSE,
>> this too seems to just hang there on my big table.
>>
>> I changed the maintenance_work_men to 2GB for this operation. It's
>> highly worrisome -- the above slow times are with 2GB of my server
>> dedicated to Postgresql!!!!
>>
>> Surely this is not tenable for enterprise environments? I am on a
>> 64bit RedHat server with dual CPU Intel Woodcrest or whatever that was
>> called. Postgres is 8.2.9.
>>
>> How do DB folks do this with small maintenance windows? This is for a
>> very high traffic website so it's beginning to get embarrassing.
>>
>> Would appreciate any thoughts or pointers.
>
> Upgrade to something more modern than 8.2.x.  Autovacuum was still
> very much in its infancy back then.  9.0 or higher is a good choice.
> What do iostat -xd 10 and vmstat 10 and top say about these processes
> when they're running.  "It's taking a really long time and seems like
> it's hanging" tells us nothing useful.  Your OS has tools to let you
> figure out what's bottlenecking your operations, so get familiar with
> them and let us know what they tell you.  These are all suggestions I
> made before which you have now classified as "not answering your
> questions" so I'm getting a little tired of helping you when you don't
> seem interested in helping yourself.
>
> What are your vacuum and autovacuum costing values set to?  Can you
> make vacuum and / or autovacuum more aggresive?

Also a few more questions, what are you using for storage?  How many
drives, RAID controller if any, RAID configuration etc.?

Re: REINDEX takes half a day (and still not complete!)

From
Phoenix
Date:
Thanks Scott.

I have shared huge amounts of info in my emails to Merlin and you.
Intentionally not shared in public. Apologies if you are feeling
tired.

The reason I need to REINDEX is because a simple SELECT query based on
the index column is taking ages. It used to take less than a second. I
want to make sure that the index is properly in place, at least.

We went through some BLOAT reports. Apparently Merlin told me there's
no significant bloat.

A manual VACUUM right now takes ages too. AUTOVACUUM settings are below.

It's a RAID 1 setup. Two Raptor 10000rpm disks.

TOP does not show much beyond "postmaster". How should I use TOP and
what info can I give you? This is what it looks like:


14231 root      18   0  4028  872  728 R 93.8  0.0  28915:37
exim_dbmbuild
11001 root      25   0  4056  864  716 R 93.8  0.0  23111:06
exim_dbmbuild
16400 root      25   0  4824  864  720 R 92.5  0.0  33843:52
exim_dbmbuild
 4799 postgres  15   0  532m  94m  93m D  0.7  1.2   0:00.14
postmaster
12292 nobody    15   0 48020  14m 5088 S  0.7  0.2   0:00.06 httpd
12943 root      17   0  2828 1224  776 R  0.7  0.0   0:00.04 top
 7236 mysql     16   0  224m  64m 3692 S  0.3  0.8  26:43.46 mysqld
31421 postgres  15   0  530m  12m  12m S  0.3  0.2   0:03.08
postmaster
31430 postgres  15   0 10456  576  224 S  0.3  0.0   0:00.08
postmaster
  955 postgres  15   0  532m  91m  90m S  0.3  1.1   0:00.15
postmaster
 1054 postgres  15   0  532m 196m 195m S  0.3  2.4   0:00.37
postmaster
 1232 postgres  15   0  532m  99m  98m D  0.3  1.2   0:00.27
postmaster
 1459 postgres  15   0  532m  86m  85m S  0.3  1.1   0:00.12
postmaster
 4552 postgres  15   0  532m  86m  85m S  0.3  1.1   0:00.08
postmaster
 7187 postgres  15   0  532m 157m 155m S  0.3  1.9   0:00.19
postmaster
 7587 postgres  15   0  532m 175m 173m D  0.3  2.2   0:00.23
postmaster
 8131 postgres  15   0  532m 154m 152m S  0.3  1.9   0:00.15
postmaster
 9473 nobody    16   0 48268  15m 5800 S  0.3  0.2   0:00.34 httpd
 9474 nobody    15   0 48096  14m 5472 S  0.3  0.2   0:00.27 httpd
10688 nobody    16   0     0    0    0 Z  0.3  0.0   0:00.20 httpd
<defunct>
12261 nobody    15   0 47956  13m 4296 S  0.3  0.2   0:00.08 httpd
12278 nobody    15   0 47956  13m 4052 S  0.3  0.2   0:00.04 httpd
12291 nobody    15   0 47972  14m 4956 S  0.3  0.2   0:00.07 httpd
12673 nobody    15   0 47912  13m 4180 S  0.3  0.2   0:00.02 httpd
12674 nobody    15   0 47936  13m 4924 S  0.3  0.2   0:00.02 httpd
12678 nobody    16   0 47912  13m 4060 S  0.3  0.2   0:00.01 httpd
12727 nobody    15   0 47912  13m 4024 S  0.3  0.2   0:00.03 httpd
12735 nobody    15   0 47912  13m 4144 S  0.3  0.2   0:00.02 httpd


VMSTAT 10 shows this:


 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 3 14  99552  17900  41108 7201712    0    0    42    11    0     0  8 34 41 16
 2 17  99552  16468  41628 7203012    0    0  1326    84 1437 154810  7 66 12 15
 3  7  99476  16796  41056 7198976    0    0  1398    96 1453 156211  7 66 21  6
 3 17  99476  17228  39132 7177240    0    0  1325    68 1529 156111  8 65 16 11




The results of "iostat -xd 10" is:



Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda          0.24  24.55  9.33  4.41  111.31  231.75    55.65   115.88
   24.97     0.17   12.09   6.67   9.17
sdb          0.06  97.65  2.21  3.97   91.59  389.58    45.80   194.79
   77.84     0.06    9.95   2.73   1.69
sdc          1.46  62.71 187.20 29.13  132.43  311.72    66.22
155.86     2.05     0.36    1.65   1.12  24.33

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda          0.00   7.41  0.30  3.50    2.40   87.29     1.20    43.64
   23.58     0.13   32.92  10.03   3.81
sdb          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00
    0.00     0.00    0.00   0.00   0.00
sdc          0.00  18.32 158.26  4.10 2519.32  180.98  1259.66
90.49    16.63    13.04   79.91   6.17 100.11

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda          0.00   6.21  0.00  1.40    0.00   60.86     0.00    30.43
   43.43     0.03   20.07  15.00   2.10
sdb          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00
    0.00     0.00    0.00   0.00   0.00
sdc          0.10  10.31 159.06  2.50 2635.44  101.70  1317.72
50.85    16.94    12.82   79.44   6.20 100.12




8GB memory in total. 1GB devoted to PGSQL during these operations.
Otherwise, my settings are as follows (and yes I did make the vacuum
settings more aggressive based on your email, which has had no
apparent impact) --

max_connections              = 350
shared_buffers               = 500MB
effective_cache_size         = 1250MB
max_fsm_relations            = 1500
max_fsm_pages                = 950000
work_mem                     = 100MB
maintenance_work_mem         = 200MB
temp_buffers                 = 4096
authentication_timeout       = 10s
ssl                          = off
checkpoint_warning           = 3600
random_page_cost             = 1



What else can I share?

Thanks much for offering to help.



On Sun, Apr 17, 2011 at 11:44 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Sun, Apr 17, 2011 at 9:30 AM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
>> Sorry, rejuvenating a thread that was basically unanswered.
>>
>> I closed the database for any kinds of access to focus on maintenance
>> operations, killed all earlier processes so that my maintenance is the
>> only stuff going on.
>>
>> REINDEX is still taking 3 hours -- and it is still not finished!
>>
>> Similarly, if I cancel the REINDEX and issue a VACUUM ANALYZE VERBOSE,
>> this too seems to just hang there on my big table.
>>
>> I changed the maintenance_work_men to 2GB for this operation. It's
>> highly worrisome -- the above slow times are with 2GB of my server
>> dedicated to Postgresql!!!!
>>
>> Surely this is not tenable for enterprise environments? I am on a
>> 64bit RedHat server with dual CPU Intel Woodcrest or whatever that was
>> called. Postgres is 8.2.9.
>>
>> How do DB folks do this with small maintenance windows? This is for a
>> very high traffic website so it's beginning to get embarrassing.
>>
>> Would appreciate any thoughts or pointers.
>
> Upgrade to something more modern than 8.2.x.  Autovacuum was still
> very much in its infancy back then.  9.0 or higher is a good choice.
> What do iostat -xd 10 and vmstat 10 and top say about these processes
> when they're running.  "It's taking a really long time and seems like
> it's hanging" tells us nothing useful.  Your OS has tools to let you
> figure out what's bottlenecking your operations, so get familiar with
> them and let us know what they tell you.  These are all suggestions I
> made before which you have now classified as "not answering your
> questions" so I'm getting a little tired of helping you when you don't
> seem interested in helping yourself.
>
> What are your vacuum and autovacuum costing values set to?  Can you
> make vacuum and / or autovacuum more aggresive?
>

Re: REINDEX takes half a day (and still not complete!)

From
Alan Hodgson
Date:

On April 17, 2011, Phoenix <phoenix.kiula@gmail.com> wrote:

> >> Surely this is not tenable for enterprise environments? I am on a

> >> 64bit RedHat server with dual CPU Intel Woodcrest or whatever that was

> >> called. Postgres is 8.2.9.

> >>

.. and you have essentially 1 disk drive. Your hardware is not sized for a database server.

>> it's a RAID 1 setup. Two Raptor 10000rpm disks.

Re: REINDEX takes half a day (and still not complete!)

From
Scott Marlowe
Date:
On Sun, Apr 17, 2011 at 10:59 AM, Phoenix <phoenix.kiula@gmail.com> wrote:
> TOP does not show much beyond "postmaster". How should I use TOP and
> what info can I give you? This is what it looks like:

We're basically looking to see if the postmaster process doing the
vacuuming or reindexing is stuck in a D state, which means it's
waiting on IO.
hot the c key while it's running and you should get a little more info
on which processes are what.

>  4799 postgres  15   0  532m  94m  93m D  0.7  1.2   0:00.14
> postmaster

That is likely the postmaster that is waiting on IO.

> VMSTAT 10 shows this:
>
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
>  3 14  99552  17900  41108 7201712    0    0    42    11    0     0  8 34 41 16
>  2 17  99552  16468  41628 7203012    0    0  1326    84 1437 154810  7 66 12 15
>  3  7  99476  16796  41056 7198976    0    0  1398    96 1453 156211  7 66 21  6
>  3 17  99476  17228  39132 7177240    0    0  1325    68 1529 156111  8 65 16 11

So, we're at 11 to 15% io wait.  I'm gonna guess you have 8 cores /
threads in your CPUs, and 1/8th ot 100% is 12% so looks like you're
probably IO bound here.  iostat tells us more:

> The results of "iostat -xd 10" is:
> Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
> avgrq-sz avgqu-sz   await  svctm  %util
> sda          0.00   7.41  0.30  3.50    2.40   87.29     1.20    43.64
>   23.58     0.13   32.92  10.03   3.81
> sdb          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00
>    0.00     0.00    0.00   0.00   0.00
> sdc          0.00  18.32 158.26  4.10 2519.32  180.98  1259.66
> 90.49    16.63    13.04   79.91   6.17 100.11

100% IO utilization, so yea, it's likely that your sdc drive is your
bottleneck.  Given our little data is actually moving through the sdc
drive, it's not very fast.

> Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s

> 8GB memory in total. 1GB devoted to PGSQL during these operations.
> Otherwise, my settings are as follows (and yes I did make the vacuum
> settings more aggressive based on your email, which has had no
> apparent impact) --

Yeah, as it gets more aggressive it can use more of your IO bandwidth.
 Since you

> What else can I share?

That's a lot of help.  I'm assuming you're running software or
motherboard fake-raid on this RAID-1 set?  I'd suggest buying a $500
or so battery backed caching RAID controller first,  the improvements
in performance are huge with such a card.  You might wanna try testing
the current RAID-1 set with bonnie++ to get an idea of how fast it is.

Re: REINDEX takes half a day (and still not complete!)

From
Phoenix Kiula
Date:
Thanks for these suggestions.

I am beginning to wonder if the issue is deeper.

I set autovacuum to off, then turned off all the connections to the
database, and did a manual vacuum just to see how long it takes.

This was last night my time. I woke up this morning and it has still
not finished.

The maintenance_men given to the DB for this process was 2GB.

There is nothing else going on on the server! Now, even REINDEX is
just failing in the middle:


# REINDEX INDEX new_idx_userid;
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.


What else could be wrong?




On Mon, Apr 18, 2011 at 2:38 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Sun, Apr 17, 2011 at 10:59 AM, Phoenix <phoenix.kiula@gmail.com> wrote:
>> TOP does not show much beyond "postmaster". How should I use TOP and
>> what info can I give you? This is what it looks like:
>
> We're basically looking to see if the postmaster process doing the
> vacuuming or reindexing is stuck in a D state, which means it's
> waiting on IO.
> hot the c key while it's running and you should get a little more info
> on which processes are what.
>
>>  4799 postgres  15   0  532m  94m  93m D  0.7  1.2   0:00.14
>> postmaster
>
> That is likely the postmaster that is waiting on IO.
>
>> VMSTAT 10 shows this:
>>
>>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
>>  3 14  99552  17900  41108 7201712    0    0    42    11    0     0  8 34 41 16
>>  2 17  99552  16468  41628 7203012    0    0  1326    84 1437 154810  7 66 12 15
>>  3  7  99476  16796  41056 7198976    0    0  1398    96 1453 156211  7 66 21  6
>>  3 17  99476  17228  39132 7177240    0    0  1325    68 1529 156111  8 65 16 11
>
> So, we're at 11 to 15% io wait.  I'm gonna guess you have 8 cores /
> threads in your CPUs, and 1/8th ot 100% is 12% so looks like you're
> probably IO bound here.  iostat tells us more:
>
>> The results of "iostat -xd 10" is:
>> Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
>> avgrq-sz avgqu-sz   await  svctm  %util
>> sda          0.00   7.41  0.30  3.50    2.40   87.29     1.20    43.64
>>   23.58     0.13   32.92  10.03   3.81
>> sdb          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00
>>    0.00     0.00    0.00   0.00   0.00
>> sdc          0.00  18.32 158.26  4.10 2519.32  180.98  1259.66
>> 90.49    16.63    13.04   79.91   6.17 100.11
>
> 100% IO utilization, so yea, it's likely that your sdc drive is your
> bottleneck.  Given our little data is actually moving through the sdc
> drive, it's not very fast.
>
>> Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
>
>> 8GB memory in total. 1GB devoted to PGSQL during these operations.
>> Otherwise, my settings are as follows (and yes I did make the vacuum
>> settings more aggressive based on your email, which has had no
>> apparent impact) --
>
> Yeah, as it gets more aggressive it can use more of your IO bandwidth.
>  Since you
>
>> What else can I share?
>
> That's a lot of help.  I'm assuming you're running software or
> motherboard fake-raid on this RAID-1 set?  I'd suggest buying a $500
> or so battery backed caching RAID controller first,  the improvements
> in performance are huge with such a card.  You might wanna try testing
> the current RAID-1 set with bonnie++ to get an idea of how fast it is.
>

Re: REINDEX takes half a day (and still not complete!)

From
Phoenix Kiula
Date:
Btw, hardware is not an issue. My db has been working fine for a
while. Smaller poorer systems around the web run InnoDB databases. I
wouldn't touch that with a barge pole.

I have a hardware RAID controller, not "fake". It's a good quality
battery-backed 3Ware:
http://192.19.193.26/products/serial_ata2-9000.asp



On Mon, Apr 18, 2011 at 1:14 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> Thanks for these suggestions.
>
> I am beginning to wonder if the issue is deeper.
>
> I set autovacuum to off, then turned off all the connections to the
> database, and did a manual vacuum just to see how long it takes.
>
> This was last night my time. I woke up this morning and it has still
> not finished.
>
> The maintenance_men given to the DB for this process was 2GB.
>
> There is nothing else going on on the server! Now, even REINDEX is
> just failing in the middle:
>
>
> # REINDEX INDEX new_idx_userid;
> server closed the connection unexpectedly
>        This probably means the server terminated abnormally
>        before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>
>
> What else could be wrong?
>
>
>
>
> On Mon, Apr 18, 2011 at 2:38 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>> On Sun, Apr 17, 2011 at 10:59 AM, Phoenix <phoenix.kiula@gmail.com> wrote:
>>> TOP does not show much beyond "postmaster". How should I use TOP and
>>> what info can I give you? This is what it looks like:
>>
>> We're basically looking to see if the postmaster process doing the
>> vacuuming or reindexing is stuck in a D state, which means it's
>> waiting on IO.
>> hot the c key while it's running and you should get a little more info
>> on which processes are what.
>>
>>>  4799 postgres  15   0  532m  94m  93m D  0.7  1.2   0:00.14
>>> postmaster
>>
>> That is likely the postmaster that is waiting on IO.
>>
>>> VMSTAT 10 shows this:
>>>
>>>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
>>>  3 14  99552  17900  41108 7201712    0    0    42    11    0     0  8 34 41 16
>>>  2 17  99552  16468  41628 7203012    0    0  1326    84 1437 154810  7 66 12 15
>>>  3  7  99476  16796  41056 7198976    0    0  1398    96 1453 156211  7 66 21  6
>>>  3 17  99476  17228  39132 7177240    0    0  1325    68 1529 156111  8 65 16 11
>>
>> So, we're at 11 to 15% io wait.  I'm gonna guess you have 8 cores /
>> threads in your CPUs, and 1/8th ot 100% is 12% so looks like you're
>> probably IO bound here.  iostat tells us more:
>>
>>> The results of "iostat -xd 10" is:
>>> Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
>>> avgrq-sz avgqu-sz   await  svctm  %util
>>> sda          0.00   7.41  0.30  3.50    2.40   87.29     1.20    43.64
>>>   23.58     0.13   32.92  10.03   3.81
>>> sdb          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00
>>>    0.00     0.00    0.00   0.00   0.00
>>> sdc          0.00  18.32 158.26  4.10 2519.32  180.98  1259.66
>>> 90.49    16.63    13.04   79.91   6.17 100.11
>>
>> 100% IO utilization, so yea, it's likely that your sdc drive is your
>> bottleneck.  Given our little data is actually moving through the sdc
>> drive, it's not very fast.
>>
>>> Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
>>
>>> 8GB memory in total. 1GB devoted to PGSQL during these operations.
>>> Otherwise, my settings are as follows (and yes I did make the vacuum
>>> settings more aggressive based on your email, which has had no
>>> apparent impact) --
>>
>> Yeah, as it gets more aggressive it can use more of your IO bandwidth.
>>  Since you
>>
>>> What else can I share?
>>
>> That's a lot of help.  I'm assuming you're running software or
>> motherboard fake-raid on this RAID-1 set?  I'd suggest buying a $500
>> or so battery backed caching RAID controller first,  the improvements
>> in performance are huge with such a card.  You might wanna try testing
>> the current RAID-1 set with bonnie++ to get an idea of how fast it is.
>>
>

Re: REINDEX takes half a day (and still not complete!)

From
Claudio Freire
Date:
On Mon, Apr 18, 2011 at 7:14 AM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> # REINDEX INDEX new_idx_userid;
> server closed the connection unexpectedly
>        This probably means the server terminated abnormally
>        before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>
>
> What else could be wrong?


That's hardly enough information to guess, but since we're trying to
guess, maybe your maintainance_mem went overboard and your server ran
out of RAM. Or disk space.

Aside from a bug, that's the only reason I can think for a pg backend
to bail out like that. Well, the connection could have been cut off by
other means (ie: someone tripped on the cable or something), but lets
not dwell on those options.

Re: REINDEX takes half a day (and still not complete!)

From
Claudio Freire
Date:
On Mon, Apr 18, 2011 at 8:39 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
> Aside from a bug, that's the only reason I can think for a pg backend
> to bail out like that. Well, the connection could have been cut off by
> other means (ie: someone tripped on the cable or something), but lets
> not dwell on those options.


Sorry for the double-post, but I should add, you really should set up
some kind of monitoring, like cacti[0] with snmp or a similar setup,
so you can monitor the state of your server in detail without having
to stare at it.

[0] http://www.cacti.net/

Re: REINDEX takes half a day (and still not complete!)

From
Shashank Tripathi
Date:
People are running larger InnoDB databases on poorer hardware. Note
that I wouldn't dream of it because I care about data integrity and
stability, but this discussion is purely about performance and I know
it is possible.

I am sure throwing hardware at it is not the solution. Just trying to
highlight what the root cause is. Raptor disks are not that bad, even
if there's just "one" disk with RAID1, especially for a SELECT-heavy
web app.

Scott's idea of upgrading to 9.x is a good one. But it's not been easy
in the past. There have been issues related to UTF-8, after the whole
RPM stuff on CentOS has been sorted out.

QUESTION:
If auto_vaccum is ON, and I'm running a manual vacuum, will they
coflict with each other or will basically one of them wait for the
other to finish?



On Mon, Apr 18, 2011 at 2:13 AM, Alan Hodgson <ahodgson@simkin.ca> wrote:
> On April 17, 2011, Phoenix <phoenix.kiula@gmail.com> wrote:
>
>> >> Surely this is not tenable for enterprise environments? I am on a
>
>> >> 64bit RedHat server with dual CPU Intel Woodcrest or whatever that was
>
>> >> called. Postgres is 8.2.9.
>
>> >>
>
> .. and you have essentially 1 disk drive. Your hardware is not sized for a
> database server.
>
>>> it's a RAID 1 setup. Two Raptor 10000rpm disks.

Re: REINDEX takes half a day (and still not complete!)

From
Scott Marlowe
Date:
On Sun, Apr 17, 2011 at 11:19 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> Btw, hardware is not an issue. My db has been working fine for a
> while. Smaller poorer systems around the web run InnoDB databases. I
> wouldn't touch that with a barge pole.

Did you or someone in an earlier post say that you didn't have
problems with table bloat?  I can't remember for sure.

Anyway if it's not hardware then it's drivers or your OS.  The output
of iostat is abysmally bad.  100% utilization but actual throughput is
pretty low.  Have you used the CLI utility for your RAID card to check
for possible problems or errors?  Maybe your battery is dead or
non-functioning?  Don't just rule out hardware until you're sure yours
is working well.

Re: REINDEX takes half a day (and still not complete!)

From
Scott Marlowe
Date:
On Mon, Apr 18, 2011 at 1:26 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Sun, Apr 17, 2011 at 11:19 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
>> Btw, hardware is not an issue. My db has been working fine for a
>> while. Smaller poorer systems around the web run InnoDB databases. I
>> wouldn't touch that with a barge pole.
>
> Did you or someone in an earlier post say that you didn't have
> problems with table bloat?  I can't remember for sure.
>
> Anyway if it's not hardware then it's drivers or your OS.  The output
> of iostat is abysmally bad.  100% utilization but actual throughput is
> pretty low.  Have you used the CLI utility for your RAID card to check
> for possible problems or errors?  Maybe your battery is dead or
> non-functioning?  Don't just rule out hardware until you're sure yours
> is working well.

For instance, here is what I get from iostat on my very CPU bound 8
core opteron machine with a battery backed caching controller:

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               0.00     9.50  0.30 11.20     2.40  1826.40   159.03
    0.01    0.54   0.50   0.58
sdb              42.40   219.80 114.60 41.10 27982.40  2088.80
193.14     0.26    1.67   1.42  22.16

Note that sda is the system / pg_xlog drive, and sdb is the /data/base
dir, minus pg_xlog.  I'm reading ~19MB/s and writing ~1MB/s on sdb and
that's using 22% of the IO approximately.  My CPUs are all pegged at
100% and I'm getting ~2500 tps.

I'm betting pgbench on your system will get something really low like
200 tps and be maxing out your %util.

Re: REINDEX takes half a day (and still not complete!)

From
Scott Marlowe
Date:
On Sun, Apr 17, 2011 at 11:19 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> Btw, hardware is not an issue. My db has been working fine for a
> while. Smaller poorer systems around the web run InnoDB databases. I
> wouldn't touch that with a barge pole.
>
> I have a hardware RAID controller, not "fake". It's a good quality
> battery-backed 3Ware:
> http://192.19.193.26/products/serial_ata2-9000.asp

(please stop top posting)

Also, when you run top and hit c what do those various postgres
processes say they're doing?  bgwriter, SELECT, VACUMM etc?

Re: REINDEX takes half a day (and still not complete!)

From
Phoenix Kiula
Date:
On Mon, Apr 18, 2011 at 3:38 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Sun, Apr 17, 2011 at 11:19 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
>> Btw, hardware is not an issue. My db has been working fine for a
>> while. Smaller poorer systems around the web run InnoDB databases. I
>> wouldn't touch that with a barge pole.
>>
>> I have a hardware RAID controller, not "fake". It's a good quality
>> battery-backed 3Ware:
>> http://192.19.193.26/products/serial_ata2-9000.asp
>
> (please stop top posting)
>
> Also, when you run top and hit c what do those various postgres
> processes say they're doing?  bgwriter, SELECT, VACUMM etc?
>





Thanks. But let me do the "top" stuff later. I think I have a bigger
problem now.

While doing a PG dump, I seem to get this error:

    ERROR: invalid memory alloc request size 4294967293

Upon googling, this seems to be a data corruption issue!

One of the older messages suggests that I do "file level backup and
restore the data" -
http://archives.postgresql.org/pgsql-admin/2008-05/msg00191.php

How does one do this -- should I copy the data folder? What are the
specific steps to restore from here, would I simply copy the files
from the data folder back to the new install or something? Cant find
these steps in the PG documentation.

I'm on PG 8.2.9, CentOS 5, with 8GB of RAM.

Re: REINDEX takes half a day (and still not complete!)

From
Scott Marlowe
Date:
On Mon, Apr 18, 2011 at 1:45 AM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> On Mon, Apr 18, 2011 at 3:38 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>> On Sun, Apr 17, 2011 at 11:19 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
>>> Btw, hardware is not an issue. My db has been working fine for a
>>> while. Smaller poorer systems around the web run InnoDB databases. I
>>> wouldn't touch that with a barge pole.
>>>
>>> I have a hardware RAID controller, not "fake". It's a good quality
>>> battery-backed 3Ware:
>>> http://192.19.193.26/products/serial_ata2-9000.asp
>>
>> (please stop top posting)
>>
>> Also, when you run top and hit c what do those various postgres
>> processes say they're doing?  bgwriter, SELECT, VACUMM etc?
>>
>
>
>
>
>
> Thanks. But let me do the "top" stuff later. I think I have a bigger
> problem now.
>
> While doing a PG dump, I seem to get this error:
>
>    ERROR: invalid memory alloc request size 4294967293
>
> Upon googling, this seems to be a data corruption issue!
>
> One of the older messages suggests that I do "file level backup and
> restore the data" -
> http://archives.postgresql.org/pgsql-admin/2008-05/msg00191.php
>
> How does one do this -- should I copy the data folder? What are the
> specific steps to restore from here, would I simply copy the files
> from the data folder back to the new install or something? Cant find
> these steps in the PG documentation.
>
> I'm on PG 8.2.9, CentOS 5, with 8GB of RAM.

I wonder if you've got a drive going bad (or both of them) what does
your RAID card have to say about the drives?

To do a file level backup, setup another machine on the same network,
with enough space on a drive with write access for the account you
want to backup to.  Shut down the Postgres server (sudo
/etc/init.d/postgresql stop or something like that) then use rsync
-avl /data/pgdir remoteserver:/newdatadir/ to back it up.  you want to
start with that so you can at least get back to where you are now if
things go wrong.

Also, after that, run memtest86+ to make sure you don't have memory errors.

Re: REINDEX takes half a day (and still not complete!)

From
Sethu Prasad
Date:

You mean the maintenance instead of mentioning the recovery? If yes

The following types of administration commands are not accepted during recovery mode:

  • Data Definition Language (DDL) - e.g. CREATE INDEX

  • Privilege and Ownership - GRANT, REVOKE, REASSIGN

  • Maintenance commands - ANALYZE, VACUUM, CLUSTER, REINDEX

Thanks.


On Sun, Apr 17, 2011 at 5:30 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
Sorry, rejuvenating a thread that was basically unanswered.

I closed the database for any kinds of access to focus on maintenance
operations, killed all earlier processes so that my maintenance is the
only stuff going on.

REINDEX is still taking 3 hours -- and it is still not finished!

Similarly, if I cancel the REINDEX and issue a VACUUM ANALYZE VERBOSE,
this too seems to just hang there on my big table.

I changed the maintenance_work_men to 2GB for this operation. It's
highly worrisome -- the above slow times are with 2GB of my server
dedicated to Postgresql!!!!

Surely this is not tenable for enterprise environments? I am on a
64bit RedHat server with dual CPU Intel Woodcrest or whatever that was
called. Postgres is 8.2.9.

How do DB folks do this with small maintenance windows? This is for a
very high traffic website so it's beginning to get embarrassing.

Would appreciate any thoughts or pointers.

Thanks!



On Mon, Mar 21, 2011 at 9:28 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Fri, Mar 18, 2011 at 10:07 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
>> I have a large table but not as large as the kind of numbers that get
>> discussed on this list. It has 125 million rows.
>>
>> REINDEXing the table takes half a day, and it's still not finished.
>>
>> To write this post I did "SELECT COUNT(*)", and here's the output -- so long!
>>
>>    select count(*) from links;
>>       count
>>    -----------
>>     125418191
>>    (1 row)
>>
>>    Time: 1270405.373 ms
>>
>> That's 1270 seconds!
>>
>> I suppose the vaccuum analyze is not doing its job? As you can see
>> from settings below, I have autovacuum set to ON, and there's also a
>> cronjob every 10 hours to do a manual vacuum analyze on this table,
>> which is largest.
>>
>> PG is version 8.2.9.
>>
>> Any thoughts on what I can do to improve performance!?
>>
>> Below are my settings.
>>
>>
>>
>> max_connections              = 300
>> shared_buffers               = 500MB
>> effective_cache_size         = 1GB
>> max_fsm_relations            = 1500
>> max_fsm_pages                = 950000
>>
>> work_mem                     = 100MB
>> temp_buffers                 = 4096
>> authentication_timeout       = 10s
>> ssl                          = off
>> checkpoint_warning           = 3600
>> random_page_cost             = 1
>>
>> autovacuum                   = on
>> autovacuum_vacuum_cost_delay = 20
>>
>> vacuum_cost_delay            = 20
>> vacuum_cost_limit            = 600
>>
>> autovacuum_naptime           = 10
>> stats_start_collector        = on
>> stats_row_level              = on
>> autovacuum_vacuum_threshold  = 75
>> autovacuum_analyze_threshold = 25
>> autovacuum_analyze_scale_factor  = 0.02
>> autovacuum_vacuum_scale_factor   = 0.01
>>
>> wal_buffers                  = 64
>> checkpoint_segments          = 128
>> checkpoint_timeout           = 900
>> fsync                        = on
>> maintenance_work_mem         = 512MB
>
> how much memory do you have? you might want to consider raising
> maintenance_work_mem to 1GB.  Are other things going on in the
> database while you are rebuilding your indexes?  Is it possible you
> are blocked waiting on a lock for a while?
>
> How much index data is there?  Can we see the table definition along
> with create index statements?
>
> merlin
>

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: REINDEX takes half a day (and still not complete!)

From
tv@fuzzy.cz
Date:
> Thanks. But let me do the "top" stuff later. I think I have a bigger
> problem now.
>
> While doing a PG dump, I seem to get this error:
>
>     ERROR: invalid memory alloc request size 4294967293
>
> Upon googling, this seems to be a data corruption issue!
>
> One of the older messages suggests that I do "file level backup and
> restore the data" -
> http://archives.postgresql.org/pgsql-admin/2008-05/msg00191.php
>
> How does one do this -- should I copy the data folder? What are the
> specific steps to restore from here, would I simply copy the files
> from the data folder back to the new install or something? Cant find
> these steps in the PG documentation.

Just stop the database, and copy the 'data' directory somewhere else (to a
different machine prefferably). You can then start the database from this
directory copy (not sure how that works in CentOS, but you can always run
"postmaster -D directory").

>
> I'm on PG 8.2.9, CentOS 5, with 8GB of RAM.
>

This is a massive thread (and part of the important info is in another
thread other mailing lists), so maybe I've missed something important, but
it seems like:

1) You're I/O bound (according to the 100% utilization reported by iostat).

2) Well, you're running RAID1 setup, which basically means it's 1 drive
(and you're doing reindex, which means a lot of read/writes).

3) The raid controller should handle this, unless it's broken, the battery
is empty (and thus the writes are not cached) or something like that. I'm
not that familiar with 3ware - is there any diagnostic tool that you use
to check the health of the controller / drives?

4) I know you've mentioned there is no bloat (according to off-the-list
discussion with Merlin) - is this true for the table only? Because if the
index is not bloated, then there's no point in running reindex ...

BTW what is the size of the database and that big table? I know it's 125
million rows,  but how much is that? 1GB, 1TB, ... how much? What does
this return

  SELECT reltuples FROM pg_class WHERE relname = 'links';

Do you have any pg_dump backups? What size are they, compared to the live
database? Havou you tried to rebuild the database from these backups? That
would give you a fresh indexes, so you could see how a 'perfectly clean'
database looks (whether the indexes bloated, what speed is expected etc.).

regards
Tomas


Re: REINDEX takes half a day (and still not complete!)

From
Robert Haas
Date:
On Apr 17, 2011, at 11:30 AM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> Sorry, rejuvenating a thread that was basically unanswered.
>
> I closed the database for any kinds of access to focus on maintenance
> operations, killed all earlier processes so that my maintenance is the
> only stuff going on.
>
> REINDEX is still taking 3 hours -- and it is still not finished!
>
> Similarly, if I cancel the REINDEX and issue a VACUUM ANALYZE VERBOSE,
> this too seems to just hang there on my big table.
>
> I changed the maintenance_work_men to 2GB for this operation. It's
> highly worrisome -- the above slow times are with 2GB of my server
> dedicated to Postgresql!!!!
>
> Surely this is not tenable for enterprise environments? I am on a
> 64bit RedHat server with dual CPU Intel Woodcrest or whatever that was
> called. Postgres is 8.2.9.
>
> How do DB folks do this with small maintenance windows? This is for a
> very high traffic website so it's beginning to get embarrassing.
>
> Would appreciate any thoughts or pointers.

An upgrade would probably help you a lot, and as others have said it sounds like your hardware is failing, so you
probablywant to deal with that first. 

I am a bit surprised, however, that no one seems to have mentioned using CLUSTER rather than VACUUM or REINDEX.
Sometimesthat's worth a try... 

...Robert

Re: REINDEX takes half a day (and still not complete!)

From
Greg Smith
Date:
On 04/23/2011 03:44 PM, Robert Haas wrote:
> On Apr 17, 2011, at 11:30 AM, Phoenix Kiula<phoenix.kiula@gmail.com>  wrote:
>
>> Postgres is 8.2.9.
>>
>>
> An upgrade would probably help you a lot, and as others have said it sounds like your hardware is failing, so you
probablywant to deal with that first. 
>
> I am a bit surprised, however, that no one seems to have mentioned using CLUSTER rather than VACUUM or REINDEX.
Sometimesthat's worth a try... 
>

Don't know if it was for this reason or not for not mentioning it by
others, but CLUSTER isn't so great in 8.2.  The whole "not MVCC-safe"
bit does not inspire confidence on a production server.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



Re: REINDEX takes half a day (and still not complete!)

From
Phoenix Kiula
Date:
On Sat, Apr 30, 2011 at 4:07 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> On 04/23/2011 03:44 PM, Robert Haas wrote:
>>
>> On Apr 17, 2011, at 11:30 AM, Phoenix Kiula<phoenix.kiula@gmail.com>
>>  wrote:
>>
>>>
>>> Postgres is 8.2.9.
>>>
>>>
>>
>> An upgrade would probably help you a lot, and as others have said it
>> sounds like your hardware is failing, so you probably want to deal with that
>> first.
>>
>> I am a bit surprised, however, that no one seems to have mentioned using
>> CLUSTER rather than VACUUM or REINDEX. Sometimes that's worth a try...
>>
>
> Don't know if it was for this reason or not for not mentioning it by others,
> but CLUSTER isn't so great in 8.2.  The whole "not MVCC-safe" bit does not
> inspire confidence on a production server.




To everyone. Thanks so much for everything, truly. We have managed to
salvage the data by exporting it in bits and pieces.

1. First the schema only
2. Then pg_dump of specific small tables
3. Then pg_dump of timed bits of the big mammoth table

Not to jinx it, but the newer hardware seems to be doing well. I am on
9.0.4 now and it's pretty fast.

Also, as has been mentioned in this thread and other discussions on
the list, just doing a dump and then fresh reload has compacted the DB
to nearly 1/3rd of its previously reported size!

I suppose that's what I am going to do on a periodic basis from now
on. There is a lot of DELETE/UPDATE activity. But I wonder if the
vacuum stuff really should do something that's similar in function?
What do the high-end enterprise folks do -- surely they can't be
dumping/restoring every quarter or so....or are they?

Anyway, many many thanks to the lovely folks on this list. Much appreciated!

Re: REINDEX takes half a day (and still not complete!)

From
Kenneth Marshall
Date:
On Sat, Apr 30, 2011 at 05:26:36PM +0800, Phoenix Kiula wrote:
> On Sat, Apr 30, 2011 at 4:07 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> > On 04/23/2011 03:44 PM, Robert Haas wrote:
> >>
> >> On Apr 17, 2011, at 11:30 AM, Phoenix Kiula<phoenix.kiula@gmail.com>
> >> ?wrote:
> >>
> >>>
> >>> Postgres is 8.2.9.
> >>>
> >>>
> >>
> >> An upgrade would probably help you a lot, and as others have said it
> >> sounds like your hardware is failing, so you probably want to deal with that
> >> first.
> >>
> >> I am a bit surprised, however, that no one seems to have mentioned using
> >> CLUSTER rather than VACUUM or REINDEX. Sometimes that's worth a try...
> >>
> >
> > Don't know if it was for this reason or not for not mentioning it by others,
> > but CLUSTER isn't so great in 8.2. ?The whole "not MVCC-safe" bit does not
> > inspire confidence on a production server.
>
>
>
>
> To everyone. Thanks so much for everything, truly. We have managed to
> salvage the data by exporting it in bits and pieces.
>
> 1. First the schema only
> 2. Then pg_dump of specific small tables
> 3. Then pg_dump of timed bits of the big mammoth table
>
> Not to jinx it, but the newer hardware seems to be doing well. I am on
> 9.0.4 now and it's pretty fast.
>
> Also, as has been mentioned in this thread and other discussions on
> the list, just doing a dump and then fresh reload has compacted the DB
> to nearly 1/3rd of its previously reported size!
>
> I suppose that's what I am going to do on a periodic basis from now
> on. There is a lot of DELETE/UPDATE activity. But I wonder if the
> vacuum stuff really should do something that's similar in function?
> What do the high-end enterprise folks do -- surely they can't be
> dumping/restoring every quarter or so....or are they?
>
> Anyway, many many thanks to the lovely folks on this list. Much appreciated!
>

The autovacuum and space management in 9.0 is dramatically more effective
and efficient then that of 8.2. Unless you have an odd corner-case there
really should be no reason for a periodic dump/restore. This is not your
grandmother's Oldsmobile... :)

Regards,
Ken

Re: REINDEX takes half a day (and still not complete!)

From
Jim Nasby
Date:
On Apr 30, 2011, at 9:34 AM, Kenneth Marshall wrote:
>> I suppose that's what I am going to do on a periodic basis from now
>> on. There is a lot of DELETE/UPDATE activity. But I wonder if the
>> vacuum stuff really should do something that's similar in function?
>> What do the high-end enterprise folks do -- surely they can't be
>> dumping/restoring every quarter or so....or are they?
>>
>> Anyway, many many thanks to the lovely folks on this list. Much appreciated!
>>
>
> The autovacuum and space management in 9.0 is dramatically more effective
> and efficient then that of 8.2. Unless you have an odd corner-case there
> really should be no reason for a periodic dump/restore. This is not your
> grandmother's Oldsmobile... :)

In 10+ years of using Postgres, I've never come across a case where you actually *need* to dump and restore on a
regularbasis. However, you can certainly run into scenarios where vacuum simply can't keep up. If your restored
databaseis 1/3 the size of the original then this is certainly what was happening on your 8.2 setup. 

As Kenneth mentioned, 9.0 is far better in this regard than 8.2, though it's still possible that you're doing something
thatwill give it fits. I suggest that you run a weekly vacuumdb -av, capture that output and run it through pgFouine.
Thatwill give you a ton of useful information about the amount of bloat you have in each table. I would definitely look
atanything with over 20% bloat. 

BTW, in case you're still questioning using Postgres in an enterprise setting; all of our production OLTP databases run
onPostgres. The largest one is ~1.5TB and does over 650TPS on average (with peaks that are much higher). Unplanned
downtimeon that database would cost us well over $100k/hour, and we're storing financial information, so data quality
issuesare not an option (data quality was one of the primary reasons we moved away from MySQL in 2006). So yes, you can
absolutelyrun very large Postgres databases in a high-workload environment. BTW, that's also on version 8.3. 
--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



Re: REINDEX takes half a day (and still not complete!)

From
Merlin Moncure
Date:
On Sat, Apr 30, 2011 at 4:26 AM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> I suppose that's what I am going to do on a periodic basis from now
> on. There is a lot of DELETE/UPDATE activity. But I wonder if the
> vacuum stuff really should do something that's similar in function?
> What do the high-end enterprise folks do -- surely they can't be
> dumping/restoring every quarter or so....or are they?

The pg_reorg tool (google it) can rebuild a live table rebuilds
without taking major locks.  It's better to try an engineer your
database so that you have enough spare i/o to manage 1-2 continuously
running vacuums, but if things get really out of whack it's there.

merlin