Home > mailing lists

Re: Reindex taking forever, and 99% CPU - Mailing list pgsql-general

From	Kevin Grittner
Subject	Re: Reindex taking forever, and 99% CPU
Date	August 7, 2014 16:11:08
Msg-id	1407417059.87618.YahooMailNeo@web122304.mail.ne1.yahoo.com Whole thread Raw
In response to	Re: Reindex taking forever, and 99% CPU (Phoenix Kiula <phoenix.kiula@gmail.com>)
List	pgsql-general

Tree view

Phoenix Kiula <phoenix.kiula@gmail.com> wrote:

> We spent some time to do some massive cleaning of the data from
> this table. Brought it down to around 630 million rows. Overall
> size of the table including indexes is about 120GB anyway.

Deleting rows that you don't need is good, and once a vacuum has a
chance to run (to completion) against the table it should help with
performance, but unless there is a lot of free space right at the
end of the table, it won't release any disk space back to the OS --
it will be tracked as free space within the table, and reused for
future inserts and updates.  This is generally a good thing,
because it is faster to use space already allocated to the table
than to request that new space is added to the table before using
it.

> modify_date    | timestamp without time zone | default now()

Sooner or later you will realize that this should have been
timestamp with time zone, but that will be a different discussion.

> Rules:
>     __track_bigtb_deleted AS
>     ON DELETE TO bigtb

It is usually safer to create triggers rather than rules.

> Suppose we might have to explore partitioning, which would
> probably be via first letter of the alias? This would lead to
> around 26 + 9 = 35 sub-tables. Is this too many?

No; I wouldn't worry about less than about 100.  As has already
been mentioned, though, this machine is very underpowered for what
you seem to want to do with it.  Don't expect miracles.  In
particular, this is not likely to make most queries any faster, but
will help a lot with maintenance operations, like vacuuming and
indexing.

> max_connections                = 180

> temp_buffers                    = 32MB

> work_mem                        = 64MB

I just want to make sure you realize that temp_buffers is how much
RAM *each connection* is allowed to reserve indefinitely for
caching temporary tables.  So if all 180 allowed connections were
in use, and they had all used temporary tables of significant size,
then *even when all connections are idle* they would have 5.76GB of
RAM reserved exclusively for caching temp tables.  On a machine
with 4GB RAM that would probably cause things to crash.

Also, work_mem is questionable.  This is not limited to one per
connection; there can be one allocation of that size for each plan
node of an active query which needs working memory (sorts, hash
maps, etc.).  So one connection can be using a number of these at
one time, although only when a query is active.  Because one
connection may be using many, while others are using none, it is
often a good idea to start from the assumption that it should be
sized on the assumption of one allocation per connection.  64MB *
180 = 11.52GB.  This is in addition to the 5.76GB you allow for
temp_buffers.  It is no wonder you are seeing crashes -- you have
configured the database so that it is allowed to use 4x the
machine's RAM just for these two things!

In my experience, a good starting point for work_mem is 25% of
machine RAM / max_connections.  You can adjust from there based on
workload.  That suggests 5.5MB would be about right on your
machine.  I would probably set temp_buffers = 2MB or maybe 3MB.

> enable_indexscan                = on

These should all be on in the config file, always.  (That is the
default if the entries are commented out, of course.)  The enable_*
settings are mostly intended for diagnostic purposes, although in
extreme cases people have been known to disable a specific setting
just for the duration of a specific query; there is usually a
better solution than that, however.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-general by date:

From: Gregory Taylor
Date: 07 August 2014, 15:30:03
Subject: Re: Recursive CTE trees + Sorting by votes

From: Ray Stell
Date: 07 August 2014, 16:32:05
Subject: Re: Connecting with PostgreSQL 9.1 using the Ubuntu machine user and peer authentication method

Re: Reindex taking forever, and 99% CPU - Mailing list pgsql-general

Previous

Next