Re: REINDEX takes half a day (and still not complete!) - Mailing list pgsql-performance

From Phoenix
Subject Re: REINDEX takes half a day (and still not complete!)
Date
Msg-id BANLkTi=j-+Cauczk-q+pJuASp_PPOjXgEQ@mail.gmail.com
Whole thread Raw
In response to Re: REINDEX takes half a day (and still not complete!)  (Scott Marlowe <scott.marlowe@gmail.com>)
Responses Re: REINDEX takes half a day (and still not complete!)  (Alan Hodgson <ahodgson@simkin.ca>)
Re: REINDEX takes half a day (and still not complete!)  (Scott Marlowe <scott.marlowe@gmail.com>)
List pgsql-performance
Thanks Scott.

I have shared huge amounts of info in my emails to Merlin and you.
Intentionally not shared in public. Apologies if you are feeling
tired.

The reason I need to REINDEX is because a simple SELECT query based on
the index column is taking ages. It used to take less than a second. I
want to make sure that the index is properly in place, at least.

We went through some BLOAT reports. Apparently Merlin told me there's
no significant bloat.

A manual VACUUM right now takes ages too. AUTOVACUUM settings are below.

It's a RAID 1 setup. Two Raptor 10000rpm disks.

TOP does not show much beyond "postmaster". How should I use TOP and
what info can I give you? This is what it looks like:


14231 root      18   0  4028  872  728 R 93.8  0.0  28915:37
exim_dbmbuild
11001 root      25   0  4056  864  716 R 93.8  0.0  23111:06
exim_dbmbuild
16400 root      25   0  4824  864  720 R 92.5  0.0  33843:52
exim_dbmbuild
 4799 postgres  15   0  532m  94m  93m D  0.7  1.2   0:00.14
postmaster
12292 nobody    15   0 48020  14m 5088 S  0.7  0.2   0:00.06 httpd
12943 root      17   0  2828 1224  776 R  0.7  0.0   0:00.04 top
 7236 mysql     16   0  224m  64m 3692 S  0.3  0.8  26:43.46 mysqld
31421 postgres  15   0  530m  12m  12m S  0.3  0.2   0:03.08
postmaster
31430 postgres  15   0 10456  576  224 S  0.3  0.0   0:00.08
postmaster
  955 postgres  15   0  532m  91m  90m S  0.3  1.1   0:00.15
postmaster
 1054 postgres  15   0  532m 196m 195m S  0.3  2.4   0:00.37
postmaster
 1232 postgres  15   0  532m  99m  98m D  0.3  1.2   0:00.27
postmaster
 1459 postgres  15   0  532m  86m  85m S  0.3  1.1   0:00.12
postmaster
 4552 postgres  15   0  532m  86m  85m S  0.3  1.1   0:00.08
postmaster
 7187 postgres  15   0  532m 157m 155m S  0.3  1.9   0:00.19
postmaster
 7587 postgres  15   0  532m 175m 173m D  0.3  2.2   0:00.23
postmaster
 8131 postgres  15   0  532m 154m 152m S  0.3  1.9   0:00.15
postmaster
 9473 nobody    16   0 48268  15m 5800 S  0.3  0.2   0:00.34 httpd
 9474 nobody    15   0 48096  14m 5472 S  0.3  0.2   0:00.27 httpd
10688 nobody    16   0     0    0    0 Z  0.3  0.0   0:00.20 httpd
<defunct>
12261 nobody    15   0 47956  13m 4296 S  0.3  0.2   0:00.08 httpd
12278 nobody    15   0 47956  13m 4052 S  0.3  0.2   0:00.04 httpd
12291 nobody    15   0 47972  14m 4956 S  0.3  0.2   0:00.07 httpd
12673 nobody    15   0 47912  13m 4180 S  0.3  0.2   0:00.02 httpd
12674 nobody    15   0 47936  13m 4924 S  0.3  0.2   0:00.02 httpd
12678 nobody    16   0 47912  13m 4060 S  0.3  0.2   0:00.01 httpd
12727 nobody    15   0 47912  13m 4024 S  0.3  0.2   0:00.03 httpd
12735 nobody    15   0 47912  13m 4144 S  0.3  0.2   0:00.02 httpd


VMSTAT 10 shows this:


 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 3 14  99552  17900  41108 7201712    0    0    42    11    0     0  8 34 41 16
 2 17  99552  16468  41628 7203012    0    0  1326    84 1437 154810  7 66 12 15
 3  7  99476  16796  41056 7198976    0    0  1398    96 1453 156211  7 66 21  6
 3 17  99476  17228  39132 7177240    0    0  1325    68 1529 156111  8 65 16 11




The results of "iostat -xd 10" is:



Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda          0.24  24.55  9.33  4.41  111.31  231.75    55.65   115.88
   24.97     0.17   12.09   6.67   9.17
sdb          0.06  97.65  2.21  3.97   91.59  389.58    45.80   194.79
   77.84     0.06    9.95   2.73   1.69
sdc          1.46  62.71 187.20 29.13  132.43  311.72    66.22
155.86     2.05     0.36    1.65   1.12  24.33

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda          0.00   7.41  0.30  3.50    2.40   87.29     1.20    43.64
   23.58     0.13   32.92  10.03   3.81
sdb          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00
    0.00     0.00    0.00   0.00   0.00
sdc          0.00  18.32 158.26  4.10 2519.32  180.98  1259.66
90.49    16.63    13.04   79.91   6.17 100.11

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda          0.00   6.21  0.00  1.40    0.00   60.86     0.00    30.43
   43.43     0.03   20.07  15.00   2.10
sdb          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00
    0.00     0.00    0.00   0.00   0.00
sdc          0.10  10.31 159.06  2.50 2635.44  101.70  1317.72
50.85    16.94    12.82   79.44   6.20 100.12




8GB memory in total. 1GB devoted to PGSQL during these operations.
Otherwise, my settings are as follows (and yes I did make the vacuum
settings more aggressive based on your email, which has had no
apparent impact) --

max_connections              = 350
shared_buffers               = 500MB
effective_cache_size         = 1250MB
max_fsm_relations            = 1500
max_fsm_pages                = 950000
work_mem                     = 100MB
maintenance_work_mem         = 200MB
temp_buffers                 = 4096
authentication_timeout       = 10s
ssl                          = off
checkpoint_warning           = 3600
random_page_cost             = 1



What else can I share?

Thanks much for offering to help.



On Sun, Apr 17, 2011 at 11:44 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Sun, Apr 17, 2011 at 9:30 AM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
>> Sorry, rejuvenating a thread that was basically unanswered.
>>
>> I closed the database for any kinds of access to focus on maintenance
>> operations, killed all earlier processes so that my maintenance is the
>> only stuff going on.
>>
>> REINDEX is still taking 3 hours -- and it is still not finished!
>>
>> Similarly, if I cancel the REINDEX and issue a VACUUM ANALYZE VERBOSE,
>> this too seems to just hang there on my big table.
>>
>> I changed the maintenance_work_men to 2GB for this operation. It's
>> highly worrisome -- the above slow times are with 2GB of my server
>> dedicated to Postgresql!!!!
>>
>> Surely this is not tenable for enterprise environments? I am on a
>> 64bit RedHat server with dual CPU Intel Woodcrest or whatever that was
>> called. Postgres is 8.2.9.
>>
>> How do DB folks do this with small maintenance windows? This is for a
>> very high traffic website so it's beginning to get embarrassing.
>>
>> Would appreciate any thoughts or pointers.
>
> Upgrade to something more modern than 8.2.x.  Autovacuum was still
> very much in its infancy back then.  9.0 or higher is a good choice.
> What do iostat -xd 10 and vmstat 10 and top say about these processes
> when they're running.  "It's taking a really long time and seems like
> it's hanging" tells us nothing useful.  Your OS has tools to let you
> figure out what's bottlenecking your operations, so get familiar with
> them and let us know what they tell you.  These are all suggestions I
> made before which you have now classified as "not answering your
> questions" so I'm getting a little tired of helping you when you don't
> seem interested in helping yourself.
>
> What are your vacuum and autovacuum costing values set to?  Can you
> make vacuum and / or autovacuum more aggresive?
>

pgsql-performance by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: REINDEX takes half a day (and still not complete!)
Next
From: Alan Hodgson
Date:
Subject: Re: REINDEX takes half a day (and still not complete!)