Thread: Reindex taking forever, and 99% CPU

Reindex taking forever, and 99% CPU

From

Phoenix Kiula

Date:

03 August 2014, 01:21:17

Hi. I've been patient. PG is 9.0.17, updated via Yum yesterday.

One of my large tables (101 GB on disk, about 1.1 billion rows) used
to take too long to vacuum. Not sure if it's an index corruption
issue. But I tried VACUUM FULL ANALYZE as recommended in another
thread yesterday, which took 5 hours on the two times I tried, without
finishing.

Now the REINDEX TABLE has taken over 6 hours as I decided to be
patient and just let something finish. Not sure this is normal though!
How do production level DBAs do this if it takes so long?

If I open another SSH window to my server and try "select * from
pg_stats_activity" it just hangs there, as the REINDEX I presume is
taking up all the memory? I basically can't do anything else on this
server.

Just in case it helps, a segment of my postgresql.conf is below. Would
appreciate any tips on what I can do.

(I did a pg_dump of just this table, which also took about 2 hours,
then I renamed the original table in the database, and tried to
pg_restore just the table, but it gave me an error message about the
archive being in the wrong format !!! So REINDEX or something like it
seems to be the only idea?)

Thanks for any help!

PS: CentOS 6 64 bit, 4 GB of RAM, Raid 1 Raptor disks. Postgresql.conf
and TOP output during the running of the REINDEX are below..


----POSTGRESQL.CONF-----

max_connections                 = 180
superuser_reserved_connections  = 5
shared_buffers                  = 512MB
effective_cache_size            = 1200MB
temp_buffers                    = 32MB
maintenance_work_mem            = 320MB
work_mem                        = 128MB
wal_buffers                     = 20MB
fsync                           = on
checkpoint_segments             = 128
checkpoint_timeout              = 1000
enable_indexscan                = on

# AUTOVAC
autovacuum                      = on
autovacuum_max_workers          = 5      # max number of autovacuum subprocesses
  #autovacuum_vacuum_scale_factor  = 0.2    # fraction of table size
before vacuum
autovacuum_vacuum_cost_delay    = 20ms
autovacuum_vacuum_cost_limit    = 350
...





------TOP OUTPUT (db name changed for privacy, with the word "MYDOMAIN") -------

top - 21:18:51 up 22 days,  7:43,  2 users,  load average: 1.20, 1.17, 1.18
Tasks: 214 total,   3 running, 211 sleeping,   0 stopped,   0 zombie
Cpu(s): 25.1%us,  1.6%sy,  0.0%ni, 71.9%id,  1.1%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:   4046644k total,  4022324k used,    24320k free,     9880k buffers
Swap:  2096440k total,   177144k used,  1919296k free,  2526536k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
21044 postgres  25   0 1102m 513m  76m R 97.7 13.0 432:03.46 postgres:
MYDOMAIN_MYDOMAIN MYDOMAIN [local] REINDEX
 8812 root      18   0 1403m  53m 3344 S  2.3  1.4 377:33.38
./jre/bin/java -Djava.compiler=NONE -cp /usr/StorMan/RaidMan.jar com.
 8319 named     24   0  317m  37m 1860 S  1.3  0.9 319:11.26
/usr/sbin/named -u named -4 -t /var/named/chroot
14184 nobody    15   0  266m  15m 5156 S  1.0  0.4   4:13.43 nginx:
worker process
14181 nobody    15   0  279m  34m 5160 S  0.7  0.9   4:13.93 nginx:
worker process
30285 root      15   0 12760 1188  820 R  0.7  0.0   0:00.03 top
  282 root      10  -5     0    0    0 S  0.3  0.0 184:37.48 [kswapd0]
25093 nobody    16   0  334m  15m 5124 S  0.3  0.4   0:01.00
/usr/local/apache/bin/httpd -k restart -DSSL
25095 nobody    15   0  334m  15m 5256 S  0.3  0.4   0:00.94
/usr/local/apache/bin/httpd -k restart -DSSL
25102 nobody    15   0  334m  15m 5120 S  0.3  0.4   0:00.93
/usr/local/apache/bin/httpd -k restart -DSSL
25106 nobody    15   0  334m  15m 5416 S  0.3  0.4   0:00.99
/usr/local/apache/bin/httpd -k restart -DSSL
25109 nobody    15   0  334m  15m 5424 S  0.3  0.4   0:00.94
/usr/local/apache/bin/httpd -k restart -DSSL
25113 nobody    16   0  334m  15m 4980 S  0.3  0.4   0:00.93
/usr/local/apache/bin/httpd -k restart -DSSL
25115 nobody    16   0  334m  15m 5192 S  0.3  0.4   0:00.95
/usr/local/apache/bin/httpd -k restart -DSSL
25117 nobody    16   0  334m  15m 4988 S  0.3  0.4   0:00.97
/usr/local/apache/bin/httpd -k restart -DSSL
25119 nobody    16   0  334m  15m 5028 S  0.3  0.4   0:00.96
/usr/local/apache/bin/httpd -k restart -DSSL
31759 root      15   0     0    0    0 S  0.3  0.0   0:35.37 [pdflush]
    1 root      15   0 10368  592  556 S  0.0  0.0   0:04.29 init [3]
    2 root      RT  -5     0    0    0 S  0.0  0.0   0:06.24
[migration/0]
    3 root      34  19     0    0    0 S  0.0  0.0   0:08.72
[ksoftirqd/0]
    4 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00
[watchdog/0]
    5 root      RT  -5     0    0    0 S  0.0  0.0   0:05.27
[migration/1]
    6 root      34  19     0    0    0 S  0.0  0.0   3:49.89
[ksoftirqd/1]
    7 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 [watchdog/1]

Re: Reindex taking forever, and 99% CPU

From

Adrian Klaver

Date:

03 August 2014, 01:47:05

On 08/02/2014 06:20 PM, Phoenix Kiula wrote:
> Hi. I've been patient. PG is 9.0.17, updated via Yum yesterday.
>
> One of my large tables (101 GB on disk, about 1.1 billion rows) used
> to take too long to vacuum. Not sure if it's an index corruption
> issue. But I tried VACUUM FULL ANALYZE as recommended in another
> thread yesterday, which took 5 hours on the two times I tried, without
> finishing.
>
> Now the REINDEX TABLE has taken over 6 hours as I decided to be
> patient and just let something finish. Not sure this is normal though!
> How do production level DBAs do this if it takes so long?

So why the REINDEX?

>
> If I open another SSH window to my server and try "select * from
> pg_stats_activity" it just hangs there, as the REINDEX I presume is
> taking up all the memory? I basically can't do anything else on this
> server.
>
> Just in case it helps, a segment of my postgresql.conf is below. Would
> appreciate any tips on what I can do.
>
> (I did a pg_dump of just this table, which also took about 2 hours,
> then I renamed the original table in the database, and tried to
> pg_restore just the table, but it gave me an error message about the
> archive being in the wrong format !!! So REINDEX or something like it
> seems to be the only idea?)

Sounds to me like you did a plain text dump and then tried to use
pg_restore to restore. One of the quirks of pg_dump/pg_restore is that
if you do a plain text dump you need to feed it to psql not pg_restore.
That being said I am not sure that increasing the size of your database
by another 101 GB on what seems to be an overloaded machine is the answer.

>
> Thanks for any help!

Still not sure what the problem is that you are trying to solve?

There was reference to VACUUM issues, but not a lot of detail. Some more
information on what specifically you where having issues with might lead
to some clarity on where to go from here.

>
> PS: CentOS 6 64 bit, 4 GB of RAM, Raid 1 Raptor disks. Postgresql.conf
> and TOP output during the running of the REINDEX are below..
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Reindex taking forever, and 99% CPU

From

John R Pierce

Date:

03 August 2014, 01:56:36

On 8/2/2014 6:20 PM, Phoenix Kiula wrote:
> PS: CentOS 6 64 bit, 4 GB of RAM, Raid 1 Raptor disks. Postgresql.conf
> and TOP output during the running of the REINDEX are below..
>
>
> ----POSTGRESQL.CONF-----
>
> max_connections                 = 180
> superuser_reserved_connections  = 5
> shared_buffers                  = 512MB
> effective_cache_size            = 1200MB
> temp_buffers                    = 32MB
> maintenance_work_mem            = 320MB
> work_mem                        = 128MB

with 4GB of ram, and 180 connections, if you actually had all 180
connections busy at once, you could use over 180 times work_mem,
180*128MB in 4GB would be fatal.



--
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: Reindex taking forever, and 99% CPU

From

Phoenix Kiula

Date:

03 August 2014, 02:03:12

Thanks John.

So what're the right settings? Anyway, right now Postgresql is
servicing only one main connection, which is the REINDEX. All other
stuff is switched off, no one else is connecting to the DB.

My issue with this table was the vaccum process would stop at this
table, and take hours. So I thought something was wrong with this
table. My version of PG was 9.0.11, and googling for similar issues
brought up an old post by Tom Lane that suggested to the poster of
that thread to upgrade. So now I have, and am at 9.0.17 -- I recognize
this is not 9.3.5, but not sure we have the appetite right now for a
massive upgrade.

So what I'm trying to do is reindex this specific table.



> iostat

Linux 2.6.18-238.9.1.el5 (coco.MYDOMAIN.com) 08/02/2014

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          10.63    0.10    3.11   13.42    0.00   72.74

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda             284.10     43828.59      5729.27 84628259628 11062603842
sda1              0.00         0.00         0.00       2272         10
sda2              3.08        44.97       989.21   86838949 1910058506
sda3              7.65       193.04        84.34  372745356  162860428
sda4              0.00         0.00         0.00          6          0
sda5              1.58        31.15         6.84   60140845   13208874
sda6              0.82        20.28         0.88   39161138    1693104
sda7              1.10        15.59        24.32   30101692   46962204
sda8              2.77        44.88        20.07   86661146   38754800
sda9            267.11     43478.67      4603.61 83952607992 8889065916






On Sun, Aug 3, 2014 at 9:56 AM, John R Pierce <pierce@hogranch.com> wrote:
> On 8/2/2014 6:20 PM, Phoenix Kiula wrote:
>>
>> PS: CentOS 6 64 bit, 4 GB of RAM, Raid 1 Raptor disks. Postgresql.conf
>> and TOP output during the running of the REINDEX are below..
>>
>>
>> ----POSTGRESQL.CONF-----
>>
>> max_connections                 = 180
>> superuser_reserved_connections  = 5
>> shared_buffers                  = 512MB
>> effective_cache_size            = 1200MB
>> temp_buffers                    = 32MB
>> maintenance_work_mem            = 320MB
>> work_mem                        = 128MB
>
>
> with 4GB of ram, and 180 connections, if you actually had all 180
> connections busy at once, you could use over 180 times work_mem, 180*128MB
> in 4GB would be fatal.
>
>
>
> --
> john r pierce                                      37N 122W
> somewhere on the middle of the left coast
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Re: Reindex taking forever, and 99% CPU

From

Adrian Klaver

Date:

03 August 2014, 02:26:16

On 08/02/2014 07:02 PM, Phoenix Kiula wrote:
> Thanks John.
>
> So what're the right settings? Anyway, right now Postgresql is
> servicing only one main connection, which is the REINDEX. All other
> stuff is switched off, no one else is connecting to the DB.
>
> My issue with this table was the vaccum process would stop at this
> table, and take hours.

In your original post you said it was stopping on pg_class so now I am
confused.


> So I thought something was wrong with this
> table. My version of PG was 9.0.11, and googling for similar issues
> brought up an old post by Tom Lane that suggested to the poster of
> that thread to upgrade. So now I have, and am at 9.0.17 -- I recognize
> this is not 9.3.5, but not sure we have the appetite right now for a
> massive upgrade.
>
> So what I'm trying to do is reindex this specific table.
>
>
>

--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Reindex taking forever, and 99% CPU

From

Phoenix Kiula

Date:

03 August 2014, 02:37:35

> In your original post you said it was stopping on pg_class so now I am
> confused.



No need to be confused. The vacuum thing is a bit tricky for laymen
like myself. The "pg_class" seemed to be associated to this table.
Anyway, even before the upgrade, the vacuum was stopping at this table
and taking forever.

The question is: what now. Where can I give you information from?
IOSTAT I've already shared.

Will the work_mem settings affect the manual REINDEX that's still
running? What can I do to speed up the REINDEX? Should I change my
autovacuum settings for this table specifcally (it's the only mammoth
table in the DB, and our main one)?

Thanks.

Re: Reindex taking forever, and 99% CPU

From

Adrian Klaver

Date:

03 August 2014, 04:08:57

On 08/02/2014 07:37 PM, Phoenix Kiula wrote:
>> In your original post you said it was stopping on pg_class so now I am
>> confused.
>
>
>
> No need to be confused. The vacuum thing is a bit tricky for laymen
> like myself. The "pg_class" seemed to be associated to this table.
> Anyway, even before the upgrade, the vacuum was stopping at this table
> and taking forever.

Well pg_class is associated with all tables, it is the system catalog
that holds information on tables, among other things.

So what made you think pg_class is involved in your issue?

I suspect you did not just pull that name out of thin air, that it came
from some log or message.

Is that the case?

>
> The question is: what now. Where can I give you information from?
> IOSTAT I've already shared.
>
> Will the work_mem settings affect the manual REINDEX that's still
> running? What can I do to speed up the REINDEX? Should I change my
> autovacuum settings for this table specifcally (it's the only mammoth
> table in the DB, and our main one)?

What would be helpful would be some information on the table itself.

What is the schema definition?

What are you storing in the table?

What is its usage pattern, SELECT only, mixed INSERT UPDATE SELECT, etc?

At this point semi-randomly changing settings and operations on this
table would seem to be counter productive.

>
> Thanks.
>
>

--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Reindex taking forever, and 99% CPU

From

Adrian Klaver

Date:

03 August 2014, 13:29:30

On 08/02/2014 07:37 PM, Phoenix Kiula wrote:
>> In your original post you said it was stopping on pg_class so now I am
>> confused.
>
>
>
> No need to be confused. The vacuum thing is a bit tricky for laymen
> like myself. The "pg_class" seemed to be associated to this table.
> Anyway, even before the upgrade, the vacuum was stopping at this table
> and taking forever.
>
> The question is: what now. Where can I give you information from?
> IOSTAT I've already shared.
>
> Will the work_mem settings affect the manual REINDEX that's still
> running? What can I do to speed up the REINDEX? Should I change my
> autovacuum settings for this table specifcally (it's the only mammoth
> table in the DB, and our main one)?

Adding to my previous post, some information from the statistic
collector would be useful. See here for more information:

http://www.postgresql.org/docs/9.0/static/monitoring-stats.html#MONITORING-STATS-VIEWS-TABLE

For now the output of:

SELECT * from pg_stat_user_tables where relname='your_table_name';

might prove helpful.

>
> Thanks.
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Reindex taking forever, and 99% CPU

From

Jeff Janes

Date:

04 August 2014, 01:55:21

On Saturday, August 2, 2014, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:

Hi. I've been patient. PG is 9.0.17, updated via Yum yesterday.

One of my large tables (101 GB on disk, about 1.1 billion rows) used
to take too long to vacuum.

Too long for what? Rome wasn't build in a day, it might not get vacuumed in a day either. So what?

Not sure if it's an index corruption
issue. But I tried VACUUM FULL ANALYZE as recommended in another
thread yesterday, which took 5 hours on the two times I tried, without
finishing.

Now the REINDEX TABLE has taken over 6 hours as I decided to be
patient and just let something finish. Not sure this is normal though!
How do production level DBAs do this if it takes so long?

Generally speaking, we don't.

If I open another SSH window to my server and try "select * from
pg_stats_activity" it just hangs there, as the REINDEX I presume is
taking up all the memory? I basically can't do anything else on this
server.

Is this large table one of the system tables?

Just in case it helps, a segment of my postgresql.conf is below. Would
appreciate any tips on what I can do.

(I did a pg_dump of just this table, which also took about 2 hours,
then I renamed the original table in the database, and tried to
pg_restore just the table, but it gave me an error message about the
archive being in the wrong format !!! So REINDEX or something like it
seems to be the only idea?)

The only idea in order to DO WHAT? So far the only problems we know about are the ones you are causing yourself, in an effort to fix some problem which we know nothing about, and which might not actually exist in the first place.

Thanks for any help!

PS: CentOS 6 64 bit, 4 GB of RAM, Raid 1 Raptor disks. Postgresql.conf
and TOP output during the running of the REINDEX are below..

Does RAID 1 mean you only have 2 disks in your RAID? If so, that is woefully inadequate to your apparent workload. The amount of RAM doesn't inspire confidence, either. If you want to use this hardware, you need to re-calibrate what "patience" means. Do a "vacuum verbose" (NOT "full") of the large table, and let it run over a weekend, at least.

----POSTGRESQL.CONF-----

max_connections = 180

That's probably absurd. If you have an application that loses track of it's connections and doesn't actually try to make use of them and you can't fix that application and you have no evidence of other problems, then this might sense, kind of, as defensive measure. But since you are in an emergency, or think you are, you should lower this.

maintenance_work_mem = 320MB

If the only thing running is the vacuum, you could give it a lot more memory than this, like 2 or 3 GB. But you should probably do that only in the session doing the "emergency" vacuum, not globally.

autovacuum_vacuum_cost_delay = 20ms

Is vacuum_cost_delay still the default of 0?

Cheers,

Jeff

Re: Reindex taking forever, and 99% CPU

From

Shaun Thomas

Date:

05 August 2014, 15:06:05

On 08/03/2014 08:55 PM, Jeff Janes wrote:

> Does RAID 1 mean you only have 2 disks in your RAID?  If so, that is
> woefully inadequate to your apparent workload. The amount of RAM
> doesn't inspire confidence, either.

Phoenix, I agree that this is probably the core of the problem you're
having. a 101GB table on a system with so few disk resources and such a
small amount of memory will take an absurdly long amount of time to
process. Vacuuming such a large table will take an extremely long time,
and reindexing it will be an exercise in frustration and possibly days
of waiting.

If you can't upgrade to better equipped hardware, I strongly suggest
implementing partitioning on the table. One of the reasons we apply
partitioning to our larger tables (generally anything over 100M rows) is
due to maintenance. If we ever need to bulk modify, reindex, or do
anything substantial to a table, it's much faster when the table isn't
so immense.

Even considering our hardware vastly outclasses what you have, it still
pays to keep table architecture "lean and mean."

Take a look here:

http://www.postgresql.org/docs/9.3/static/ddl-partitioning.html

--
Shaun Thomas
OptionsHouse, LLC | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: Reindex taking forever, and 99% CPU

From

Alexey Klyukin

Date:

06 August 2014, 14:48:51

On Sun, Aug 3, 2014 at 3:20 AM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:

Hi. I've been patient. PG is 9.0.17, updated via Yum yesterday.

One of my large tables (101 GB on disk, about 1.1 billion rows) used
to take too long to vacuum. Not sure if it's an index corruption
issue. But I tried VACUUM FULL ANALYZE as recommended in another
thread yesterday, which took 5 hours on the two times I tried, without
finishing.

Now the REINDEX TABLE has taken over 6 hours as I decided to be
patient and just let something finish. Not sure this is normal though!
How do production level DBAs do this if it takes so long?

If I open another SSH window to my server and try "select * from
pg_stats_activity" it just hangs there, as the REINDEX I presume is
taking up all the memory? I basically can't do anything else on this
server.

From my experience REINDEX on a 100GB table with such a hardware will definitely take hours.

It might be actually CPU bound, not I/O, if you have a large functional index on a table (like lower(text_column)),

and since PostgreSQL can only take use of a single core - you are out of luck.

In order to speed up the process without locking your data, you may consider create the new index with create index concurrently,

and then just drop the old one (make sure your DROP won't wait trying to acquire a lock when you do it, otherwise

other processes will start to queue after it).

I'd question the usefulness of running VACUUM FULL on a production server (there are other ways around, i.e

pg_repack or some ideas from this post: http://www.depesz.com/2013/06/21/bloat-removal-by-tuples-moving/).

Regards,

Alexey Klyukin

Re: Reindex taking forever, and 99% CPU

From

Phoenix Kiula

Date:

07 August 2014, 01:52:31

Thank you for the very specific idea of pg_stat_user.

This is what I see (the output is also included in email below, but
this is easier to read) --
https://gist.github.com/anonymous/53f748a8c6c454b804b3

The output here  (might become a jumbled mess)--

=# SELECT * from pg_stat_user_tables where relname='bigtb';
 relid  | schemaname | relname | seq_scan | seq_tup_read | idx_scan |
idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd |
n_live_tup | n_dead_tup |          last_vacuum          |
last_autovacuum |         last_analyze          | last_autoanalyze

--------+------------+---------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------+------------+------------+-------------------------------+-----------------+-------------------------------+------------------
 105954 | public     | bigtb   |        0 |            0 |   220396 |
       89781 |     63516 |         6 |       910 |             1 |
634879579 |         39 | 2014-08-06 20:12:47.163055-04 |
  | 2014-08-06 20:19:40.317771-04 | (1 row)

Time: 50.844 ms



We spent some time to do some massive cleaning of the data from this
table. Brought it down to around 630 million rows. Overall size of the
table including indexes is about 120GB anyway.

More stats that we could manage are pretty-pasted here:
https://gist.github.com/anonymous/21aaeae10584013c3820

The biggest table ("bigtb" -- codename for pasting on public forum)
stores some URLs. The most important index is for this table is the
"alias" column, which is varchar(35) as you can see.

Table definition also pasted below:



                              Table "public.bigtb"
     Column      |            Type             |            Modifiers
-----------------+-----------------------------+---------------------------------
 alias           | character varying(35)       | not null
 url             | text                        | not null
 user_registered | boolean                     |
 private_key     | character varying(6)        | default NULL::character varying
 modify_date     | timestamp without time zone | default now()
 ip              | bigint                      |
 url_md5         | text                        |

Indexes:
    "idx_bigtb_pkey" PRIMARY KEY, btree (alias)
    "idx_bigtb_ip_url" UNIQUE, btree (ip, url_md5)
    "idx_bigtb_modify_date" btree (modify_date)
    "idx_bigtb_urlmd5" btree (url_md5)
Check constraints:
    "bigtb_alias_check" CHECK (alias::text ~ '[-.~a-z0-9_]'::text)
Referenced by:
    TABLE "bigtb_registered" CONSTRAINT "fk_bigtb_registered" FOREIGN
KEY (alias) REFERENCES bigtb(alias) MATCH FULL ON UPDATE CASCADE ON
DELETE CASCADE
    TABLE "interesting" CONSTRAINT "interesting_alias_fkey" FOREIGN
KEY (alias) REFERENCES bigtb(alias) MATCH FULL ON UPDATE CASCADE
Rules:
    __track_bigtb_deleted AS
    ON DELETE TO bigtb
   WHERE NOT (EXISTS ( SELECT bigtb_deleted.alias
           FROM bigtb_deleted
          WHERE bigtb_deleted.alias::text = old.alias::text)) DO
INSERT INTO bigtb_deleted (alias, url, user_registered, modify_date)
  VALUES (old.alias, old.url, old.user_registered, old.modify_date)




What else could I do here?

As you will see in the code shared above (GIST Github link) the stats
for this table are:

bigtb -

   row count: 634,879,168
   inserted:  65613
   updated: 6
   deleted:  1013



There are recent numbers. The DB has been going down often. But
deletions would be around 20,000 per week. Updates are lowest. INSERT
and SELECT are huge, with of course SELECT being the biggest activity
(high traffic website).

We did put PGBouncer for some pooling benefits, and "memcached" for
taking some load off the postgresql server. As of this writing, the
memcached thing is caching around 200,000 URLs which would otherwise
have been a query based on the index on the "alias" column --
"idx_bigtb_pkey".

What other info can I share?

Suppose we might have to explore partitioning, which would probably be
via first letter of the alias? This would lead to around 26 + 9 = 35
sub-tables. Is this too many?

My CONFIG settings:


max_connections                 = 180       # Was 250!  -
http://www.php.net/manual/en/function.pg-pconnect.php#20309
superuser_reserved_connections  = 5
shared_buffers                  = 512MB
effective_cache_size            = 1200MB   # Nov 11 2011, was 1500MB
temp_buffers                    = 32MB     # min 800kB
maintenance_work_mem            = 320MB    # min 1MB, was 128MB
work_mem                        = 64MB
wal_buffers                     = 20MB     # min 32kB
fsync                           = on       # turns forced
synchronization on or off
checkpoint_segments             = 128      # was 128
checkpoint_timeout              = 1000     # was 1000
enable_indexscan                = on
log_min_duration_statement      = 1000



Much appreciate any further ideas!





On Sun, Aug 3, 2014 at 9:29 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> On 08/02/2014 07:37 PM, Phoenix Kiula wrote:
>>>
>>> In your original post you said it was stopping on pg_class so now I am
>>> confused.
>>
>>
>>
>>
>> No need to be confused. The vacuum thing is a bit tricky for laymen
>> like myself. The "pg_class" seemed to be associated to this table.
>> Anyway, even before the upgrade, the vacuum was stopping at this table
>> and taking forever.
>>
>> The question is: what now. Where can I give you information from?
>> IOSTAT I've already shared.
>>
>> Will the work_mem settings affect the manual REINDEX that's still
>> running? What can I do to speed up the REINDEX? Should I change my
>> autovacuum settings for this table specifcally (it's the only mammoth
>> table in the DB, and our main one)?
>
>
> Adding to my previous post, some information from the statistic collector
> would be useful. See here for more information:
>
> http://www.postgresql.org/docs/9.0/static/monitoring-stats.html#MONITORING-STATS-VIEWS-TABLE
>
> For now the output of:
>
> SELECT * from pg_stat_user_tables where relname='your_table_name';
>
> might prove helpful.
>
>
>>
>> Thanks.
>>
>>
>
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com

Re: Reindex taking forever, and 99% CPU

From

Kevin Grittner

Date:

07 August 2014, 13:11:08

Phoenix Kiula <phoenix.kiula@gmail.com> wrote:

> We spent some time to do some massive cleaning of the data from
> this table. Brought it down to around 630 million rows. Overall
> size of the table including indexes is about 120GB anyway.

Deleting rows that you don't need is good, and once a vacuum has a
chance to run (to completion) against the table it should help with
performance, but unless there is a lot of free space right at the
end of the table, it won't release any disk space back to the OS --
it will be tracked as free space within the table, and reused for
future inserts and updates.  This is generally a good thing,
because it is faster to use space already allocated to the table
than to request that new space is added to the table before using
it.

> modify_date    | timestamp without time zone | default now()

Sooner or later you will realize that this should have been
timestamp with time zone, but that will be a different discussion.

> Rules:
>     __track_bigtb_deleted AS
>     ON DELETE TO bigtb

It is usually safer to create triggers rather than rules.

> Suppose we might have to explore partitioning, which would
> probably be via first letter of the alias? This would lead to
> around 26 + 9 = 35 sub-tables. Is this too many?

No; I wouldn't worry about less than about 100.  As has already
been mentioned, though, this machine is very underpowered for what
you seem to want to do with it.  Don't expect miracles.  In
particular, this is not likely to make most queries any faster, but
will help a lot with maintenance operations, like vacuuming and
indexing.

> max_connections                = 180

> temp_buffers                    = 32MB

> work_mem                        = 64MB

I just want to make sure you realize that temp_buffers is how much
RAM *each connection* is allowed to reserve indefinitely for
caching temporary tables.  So if all 180 allowed connections were
in use, and they had all used temporary tables of significant size,
then *even when all connections are idle* they would have 5.76GB of
RAM reserved exclusively for caching temp tables.  On a machine
with 4GB RAM that would probably cause things to crash.

Also, work_mem is questionable.  This is not limited to one per
connection; there can be one allocation of that size for each plan
node of an active query which needs working memory (sorts, hash
maps, etc.).  So one connection can be using a number of these at
one time, although only when a query is active.  Because one
connection may be using many, while others are using none, it is
often a good idea to start from the assumption that it should be
sized on the assumption of one allocation per connection.  64MB *
180 = 11.52GB.  This is in addition to the 5.76GB you allow for
temp_buffers.  It is no wonder you are seeing crashes -- you have
configured the database so that it is allowed to use 4x the
machine's RAM just for these two things!

In my experience, a good starting point for work_mem is 25% of
machine RAM / max_connections.  You can adjust from there based on
workload.  That suggests 5.5MB would be about right on your
machine.  I would probably set temp_buffers = 2MB or maybe 3MB.

> enable_indexscan                = on

These should all be on in the config file, always.  (That is the
default if the entries are commented out, of course.)  The enable_*
settings are mostly intended for diagnostic purposes, although in
extreme cases people have been known to disable a specific setting
just for the duration of a specific query; there is usually a
better solution than that, however.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company