Re: 121+ million record table perf problems - Mailing list pgsql-performance

From Greg Smith
Subject Re: 121+ million record table perf problems
Date
Msg-id Pine.GSO.4.64.0705182347110.29971@westnet.com
Whole thread Raw
In response to 121+ million record table perf problems  (cyber-postgres@midnightfantasy.com)
List pgsql-performance
On Fri, 18 May 2007, cyber-postgres@midnightfantasy.com wrote:

> shared_buffers = 24MB
> work_mem = 256MB
> maintenance_work_mem = 512MB

You should take a minute to follow the suggestions at
http://www.westnet.com/~gsmith/content/postgresql/pg-5minute.htm and set
dramatically higher values for shared_buffers and effective_cache_size for
your server.  Also, your work_mem figure may be OK for now, but if ever do
have 10 people connect to this database at once and run big queries you
could have an issue with it set that high--that's a per client setting.

After you're done with that, you should also follow the suggestions there
to do a VACCUM ANALYZE.  That may knock out two other potential issues at
once.  It will take a while to run, but I think you need it badly to sort
out what you've already done.

> random_page_cost = 100

I'm not sure what logic prompted this change, but after you correct the
above you should return this to its default; if this is helping now it's
only because other things are so far off from where they should be.

> update res set state=5001;
> The update query that started this all I had to kill after 17hours.  It
> should have updated all 121+ million records.  That brought my select count
> down to 19 minutes, but still a far cry from acceptable.

You should work on the select side of this first.  If that isn't running
in a moderate amount of time, trying to get the much more difficult update
to happen quickly is hopeless.

Once the select is under control, there are a lot of parameters to adjust
that will effect the speed of the updates.  The first thing to do is
dramatically increase checkpoint_segments; I would set that to at least 30
in your situation.

Also:  going to RAID-1 won't make a bit of difference to your update
speed; could even make it worse.  Adding more RAM may not help much
either.  If you don't have one already, the real key to improving
performance in a heavy update situation is to get a better disk controller
with a cache that helps accelerate writes.  Then the next step is to
stripe this data across multiple disks in a RAID-0 configuration to split
the I/O up.

You have a lot of work ahead of you.  Even after you resolve the gross
issues here, you have a table that has around 10 indexes on it.
Maintaining those is far from free; every time you update a single record
in that table, the system has to update each of those indexes on top of
the record update itself.  So you're really asking your system to do
around 1.2 billion disk-related operations when you throw out your simple
batch update against every row, and good luck getting that to run in a
time frame that's less than days long.

The right way to get a feel for what's going on is to drop all the indexes
except for the constraints and see how the bulk update runs after the
parameter changes suggested above are in place and the database has been
cleaned up with vacuum+analyze.  Once you have a feel for that, add some
indexes back in and see how it degrades.  Then you'll know how adding each
one of them impacts your performance.  I suspect you're going to have to
redesign your indexing scheme before this is over.  I don't think your
current design is ever going to work the way you expect it to.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

pgsql-performance by date:

Previous
From: "Y Sidhu"
Date:
Subject: Re: pg_stats how-to?
Next
From: Greg Smith
Date:
Subject: Re: Background vacuum