Reindex taking forever, and 99% CPU - Mailing list pgsql-general

From Phoenix Kiula
Subject Reindex taking forever, and 99% CPU
Date
Msg-id CAFWfU=vhy2dLmMFX-c6d21C3w4VabL2F97BnLcnMedO8HDAO7Q@mail.gmail.com
Whole thread Raw
Responses Re: Reindex taking forever, and 99% CPU  (Adrian Klaver <adrian.klaver@aklaver.com>)
Re: Reindex taking forever, and 99% CPU  (John R Pierce <pierce@hogranch.com>)
Re: Reindex taking forever, and 99% CPU  (Jeff Janes <jeff.janes@gmail.com>)
Re: Reindex taking forever, and 99% CPU  (Alexey Klyukin <alexk@hintbits.com>)
List pgsql-general
Hi. I've been patient. PG is 9.0.17, updated via Yum yesterday.

One of my large tables (101 GB on disk, about 1.1 billion rows) used
to take too long to vacuum. Not sure if it's an index corruption
issue. But I tried VACUUM FULL ANALYZE as recommended in another
thread yesterday, which took 5 hours on the two times I tried, without
finishing.

Now the REINDEX TABLE has taken over 6 hours as I decided to be
patient and just let something finish. Not sure this is normal though!
How do production level DBAs do this if it takes so long?

If I open another SSH window to my server and try "select * from
pg_stats_activity" it just hangs there, as the REINDEX I presume is
taking up all the memory? I basically can't do anything else on this
server.

Just in case it helps, a segment of my postgresql.conf is below. Would
appreciate any tips on what I can do.

(I did a pg_dump of just this table, which also took about 2 hours,
then I renamed the original table in the database, and tried to
pg_restore just the table, but it gave me an error message about the
archive being in the wrong format !!! So REINDEX or something like it
seems to be the only idea?)

Thanks for any help!

PS: CentOS 6 64 bit, 4 GB of RAM, Raid 1 Raptor disks. Postgresql.conf
and TOP output during the running of the REINDEX are below..


----POSTGRESQL.CONF-----

max_connections                 = 180
superuser_reserved_connections  = 5
shared_buffers                  = 512MB
effective_cache_size            = 1200MB
temp_buffers                    = 32MB
maintenance_work_mem            = 320MB
work_mem                        = 128MB
wal_buffers                     = 20MB
fsync                           = on
checkpoint_segments             = 128
checkpoint_timeout              = 1000
enable_indexscan                = on

# AUTOVAC
autovacuum                      = on
autovacuum_max_workers          = 5      # max number of autovacuum subprocesses
  #autovacuum_vacuum_scale_factor  = 0.2    # fraction of table size
before vacuum
autovacuum_vacuum_cost_delay    = 20ms
autovacuum_vacuum_cost_limit    = 350
...





------TOP OUTPUT (db name changed for privacy, with the word "MYDOMAIN") -------

top - 21:18:51 up 22 days,  7:43,  2 users,  load average: 1.20, 1.17, 1.18
Tasks: 214 total,   3 running, 211 sleeping,   0 stopped,   0 zombie
Cpu(s): 25.1%us,  1.6%sy,  0.0%ni, 71.9%id,  1.1%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:   4046644k total,  4022324k used,    24320k free,     9880k buffers
Swap:  2096440k total,   177144k used,  1919296k free,  2526536k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
21044 postgres  25   0 1102m 513m  76m R 97.7 13.0 432:03.46 postgres:
MYDOMAIN_MYDOMAIN MYDOMAIN [local] REINDEX
 8812 root      18   0 1403m  53m 3344 S  2.3  1.4 377:33.38
./jre/bin/java -Djava.compiler=NONE -cp /usr/StorMan/RaidMan.jar com.
 8319 named     24   0  317m  37m 1860 S  1.3  0.9 319:11.26
/usr/sbin/named -u named -4 -t /var/named/chroot
14184 nobody    15   0  266m  15m 5156 S  1.0  0.4   4:13.43 nginx:
worker process
14181 nobody    15   0  279m  34m 5160 S  0.7  0.9   4:13.93 nginx:
worker process
30285 root      15   0 12760 1188  820 R  0.7  0.0   0:00.03 top
  282 root      10  -5     0    0    0 S  0.3  0.0 184:37.48 [kswapd0]
25093 nobody    16   0  334m  15m 5124 S  0.3  0.4   0:01.00
/usr/local/apache/bin/httpd -k restart -DSSL
25095 nobody    15   0  334m  15m 5256 S  0.3  0.4   0:00.94
/usr/local/apache/bin/httpd -k restart -DSSL
25102 nobody    15   0  334m  15m 5120 S  0.3  0.4   0:00.93
/usr/local/apache/bin/httpd -k restart -DSSL
25106 nobody    15   0  334m  15m 5416 S  0.3  0.4   0:00.99
/usr/local/apache/bin/httpd -k restart -DSSL
25109 nobody    15   0  334m  15m 5424 S  0.3  0.4   0:00.94
/usr/local/apache/bin/httpd -k restart -DSSL
25113 nobody    16   0  334m  15m 4980 S  0.3  0.4   0:00.93
/usr/local/apache/bin/httpd -k restart -DSSL
25115 nobody    16   0  334m  15m 5192 S  0.3  0.4   0:00.95
/usr/local/apache/bin/httpd -k restart -DSSL
25117 nobody    16   0  334m  15m 4988 S  0.3  0.4   0:00.97
/usr/local/apache/bin/httpd -k restart -DSSL
25119 nobody    16   0  334m  15m 5028 S  0.3  0.4   0:00.96
/usr/local/apache/bin/httpd -k restart -DSSL
31759 root      15   0     0    0    0 S  0.3  0.0   0:35.37 [pdflush]
    1 root      15   0 10368  592  556 S  0.0  0.0   0:04.29 init [3]
    2 root      RT  -5     0    0    0 S  0.0  0.0   0:06.24
[migration/0]
    3 root      34  19     0    0    0 S  0.0  0.0   0:08.72
[ksoftirqd/0]
    4 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00
[watchdog/0]
    5 root      RT  -5     0    0    0 S  0.0  0.0   0:05.27
[migration/1]
    6 root      34  19     0    0    0 S  0.0  0.0   3:49.89
[ksoftirqd/1]
    7 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 [watchdog/1]


pgsql-general by date:

Previous
From: Phoenix Kiula
Date:
Subject: Re: Upgrading from 9.0.11 to 9.3.5 on CentOS 6 (64 bit)
Next
From: Adrian Klaver
Date:
Subject: Re: Reindex taking forever, and 99% CPU