Thread: Massive table (500M rows) update nightmare

Massive table (500M rows) update nightmare

From
"Carlo Stonebanks"
Date:
Our DB has an audit table which is 500M rows and growing. (FYI the objects
being audited are grouped semantically, not individual field values).

Recently we wanted to add a new feature and we altered the table to add a
new column. We are backfilling this varchar(255) column by writing a TCL
script to page through the rows (where every update is a UPDATE ... WHERE id
 >= x AND id < x+10 and a commit is performed after every 1000 updates
statement, i.e. every 10000 rows.)

We have 10 columns, six of which are indexed. Rough calculations suggest
that this will take two to three weeks to complete on an 8-core CPU with
more than enough memory.

As a ballpark estimate - is this sort of performance for an 500M updates
what one would expect of PG given the table structure (detailed below) or
should I dig deeper to look for performance issues?

As always, thanks!

Carlo

Table/index structure:

CREATE TABLE mdx_core.audit_impt
(
  audit_impt_id serial NOT NULL,
  impt_session integer,
  impt_version character varying(255),
  impt_name character varying(255),
  impt_id integer,
  target_table character varying(255),
  target_id integer,
  target_op character varying(10),
  note text,
  source_table character varying(255),
  CONSTRAINT audit_impt_pkey PRIMARY KEY (audit_impt_id)
)

CREATE INDEX audit_impt_impt_id_idx
  ON mdx_core.audit_impt
  USING btree
  (impt_id);
CREATE INDEX audit_impt_impt_name
  ON mdx_core.audit_impt
  USING btree
  (impt_name, impt_version);
CREATE INDEX audit_impt_session_idx
  ON mdx_core.audit_impt
  USING btree
  (impt_session);
CREATE INDEX audit_impt_source_table
  ON mdx_core.audit_impt
  USING btree
  (source_table);
CREATE INDEX audit_impt_target_id_idx
  ON mdx_core.audit_impt
  USING btree
  (target_id, audit_impt_id);
CREATE INDEX audit_impt_target_table_idx
  ON mdx_core.audit_impt
  USING btree
  (target_table, target_id, audit_impt_id);




Re: Massive table (500M rows) update nightmare

From
Scott Marlowe
Date:
On Thu, Jan 7, 2010 at 12:17 AM, Carlo Stonebanks
<stonec.register@sympatico.ca> wrote:
> Our DB has an audit table which is 500M rows and growing. (FYI the objects
> being audited are grouped semantically, not individual field values).
>
> Recently we wanted to add a new feature and we altered the table to add a
> new column. We are backfilling this varchar(255) column by writing a TCL
> script to page through the rows (where every update is a UPDATE ... WHERE id
>>= x AND id < x+10 and a commit is performed after every 1000 updates
> statement, i.e. every 10000 rows.)
>
> We have 10 columns, six of which are indexed. Rough calculations suggest
> that this will take two to three weeks to complete on an 8-core CPU with
> more than enough memory.
>
> As a ballpark estimate - is this sort of performance for an 500M updates
> what one would expect of PG given the table structure (detailed below) or
> should I dig deeper to look for performance issues?

Got an explain analyze of the delete query?

Re: Massive table (500M rows) update nightmare

From
Leo Mannhart
Date:
Carlo Stonebanks wrote:
> Our DB has an audit table which is 500M rows and growing. (FYI the
> objects being audited are grouped semantically, not individual field
> values).
>
> Recently we wanted to add a new feature and we altered the table to add
> a new column. We are backfilling this varchar(255) column by writing a
> TCL script to page through the rows (where every update is a UPDATE ...
> WHERE id >= x AND id < x+10 and a commit is performed after every 1000
> updates statement, i.e. every 10000 rows.)
>
> We have 10 columns, six of which are indexed. Rough calculations suggest
> that this will take two to three weeks to complete on an 8-core CPU with
> more than enough memory.
>
> As a ballpark estimate - is this sort of performance for an 500M updates
> what one would expect of PG given the table structure (detailed below)
> or should I dig deeper to look for performance issues?
>
> As always, thanks!
>
> Carlo
>

If it is possible to lock this audit table exclusively (may be during
off peak hours) I would look into
- create new_audit_table as select col1, col2, col3 ... col9,
'new_col_value' from old_audit_table;
- create all indexes
- drop old_audit_table
- rename new_audit_table to old_audit_table

That is probably the fasted method you can do, even if you have to join
the "new_col_value" from an extra helper-table with the correspondig id.
Remeber, databases are born to join.

You could also try to just update the whole table in one go, it is
probably faster than you expect.

just a thought
Leo

Re: Massive table (500M rows) update nightmare

From
"Kevin Grittner"
Date:
Leo Mannhart <leo.mannhart@beecom.ch> wrote:

> You could also try to just update the whole table in one go, it is
> probably faster than you expect.

That would, of course, bloat the table and indexes horribly.  One
advantage of the incremental approach is that there is a chance for
autovacuum or scheduled vacuums to make space available for re-use
by subsequent updates.

-Kevin

Re: Massive table (500M rows) update nightmare

From
Leo Mannhart
Date:
Kevin Grittner wrote:
> Leo Mannhart <leo.mannhart@beecom.ch> wrote:
>
>> You could also try to just update the whole table in one go, it is
>> probably faster than you expect.
>
> That would, of course, bloat the table and indexes horribly.  One
> advantage of the incremental approach is that there is a chance for
> autovacuum or scheduled vacuums to make space available for re-use
> by subsequent updates.
>
> -Kevin
>

ouch...
thanks for correcting this.
... and forgive an old man coming from Oracle ;)

Leo

Re: Massive table (500M rows) update nightmare

From
Ludwik Dylag
Date:
I would suggest:
1. turn off autovacuum
1a. ewentually tune db for better performace for this kind of operation (cant not help here)
2. restart database
3. drop all indexes
4. update
5. vacuum full table
6. create indexes
7. turn on autovacuum

Ludwik


2010/1/7 Leo Mannhart <leo.mannhart@beecom.ch>
Kevin Grittner wrote:
> Leo Mannhart <leo.mannhart@beecom.ch> wrote:
>
>> You could also try to just update the whole table in one go, it is
>> probably faster than you expect.
>
> That would, of course, bloat the table and indexes horribly.  One
> advantage of the incremental approach is that there is a chance for
> autovacuum or scheduled vacuums to make space available for re-use
> by subsequent updates.
>
> -Kevin
>

ouch...
thanks for correcting this.
... and forgive an old man coming from Oracle ;)

Leo

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



--
Ludwik Dyląg

Re: Massive table (500M rows) update nightmare

From
"Kevin Grittner"
Date:
Ludwik Dylag <ldylag@gmail.com> wrote:
> I would suggest:
> 1. turn off autovacuum
> 1a. ewentually tune db for better performace for this kind of
>     operation (cant not help here)
> 2. restart database
> 3. drop all indexes
> 4. update
> 5. vacuum full table
> 6. create indexes
> 7. turn on autovacuum

I've only ever attempted something like that with a few tens of
millions of rows.  I gave up on waiting for the VACUUM FULL step
after a few days.

I some scheduled down time is acceptable (with "some" kind of hard
to estimate accurately) the best bet would be to add the column with
the USING clause to fill in the value.  (I think that would cause a
table rewrite; if not, then add something to the ALTER TABLE which
would.)  My impression is that the OP would rather stretch out the
implementation than to suffer down time, which can certainly be a
valid call.

If that is the goal, then the real question is whether there's a way
to tune the incremental updates to speed that phase.  Carlo, what
version of PostgreSQL is this?  Can you show us the results of an
EXPLAIN ANALYZE for the run of one iteration of the UPDATE?
Information on the OS, hardware, PostgreSQL build configuration, and
the contents of postgresql.conf (excluding all comments) could help
us spot possible techniques to speed this up.

-Kevin

Re: Massive table (500M rows) update nightmare

From
Kevin Kempter
Date:
On Thursday 07 January 2010 09:57:59 Kevin Grittner wrote:
> Ludwik Dylag <ldylag@gmail.com> wrote:
> > I would suggest:
> > 1. turn off autovacuum
> > 1a. ewentually tune db for better performace for this kind of
> >     operation (cant not help here)
> > 2. restart database
> > 3. drop all indexes
> > 4. update
> > 5. vacuum full table
> > 6. create indexes
> > 7. turn on autovacuum
>
> I've only ever attempted something like that with a few tens of
> millions of rows.  I gave up on waiting for the VACUUM FULL step
> after a few days.
>
> I some scheduled down time is acceptable (with "some" kind of hard
> to estimate accurately) the best bet would be to add the column with
> the USING clause to fill in the value.  (I think that would cause a
> table rewrite; if not, then add something to the ALTER TABLE which
> would.)  My impression is that the OP would rather stretch out the
> implementation than to suffer down time, which can certainly be a
> valid call.
>
> If that is the goal, then the real question is whether there's a way
> to tune the incremental updates to speed that phase.  Carlo, what
> version of PostgreSQL is this?  Can you show us the results of an
> EXPLAIN ANALYZE for the run of one iteration of the UPDATE?
> Information on the OS, hardware, PostgreSQL build configuration, and
> the contents of postgresql.conf (excluding all comments) could help
> us spot possible techniques to speed this up.
>
> -Kevin
>



If you can come up with an effective method of tracking updates/deletes/inserts
such as a trigger that writes the PK to a separate table upon any inserts,
updates or deletes to the table you could do something like this:



1) create new table (no indexes) with the structure you want the table to have
at the end of the process (i.e. the post-altered state) [new_tab]

2) create the insert,update,delete triggers mentioned above on the existing
table [curr_tab] and write all the PK id's that change into a 3rd table
[changed_keys]

3) kick off a process that simply does a select from curr_tab into new_tab and
populates/back-fills the new column as part of the query

4) let it run as long as it takes

5) once it's complete do this:

   Create the all the indexes on the new_tab

   BEGIN;

   LOCK TABLE curr_tab;

   DELETE from new_tab
       where pk_id in (select distinct pk_id from changed_keys);

   INSERT into new_tab
       select * from curr_tab
       where curr_tab.pk_id in (select distinct pk_id from changed_keys);

   ALTER TABLE curr_tab RENAME to old_tab;

   ALTER TABLE new_tab RENAME to curr_tab;

   COMMIT;



Also you might want to consider partitioning this table in the process...



Once you're confident you no longer need the old table [old_tab] you can drop
it



Re: Massive table (500M rows) update nightmare

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> If it is possible to lock this audit table exclusively (may be during
> off peak hours) I would look into
> - create new_audit_table as select col1, col2, col3 ... col9,
> 'new_col_value' from old_audit_table;
> - create all indexes
> - drop old_audit_table
> - rename new_audit_table to old_audit_table

This is a good approach, but you don't necessarily have to exclusively
lock the table. Only allowing reads would be enough, or you could
install a trigger to keep track of which rows were updated. Then
the process becomes:

1. Create trigger on the old table to store changed pks
2. Create new_audit_table as select col1, col2, col3 ... col9,
  'new_col_value' from old_audit_table;
3. Create all indexes on the new table
4. Stop your app from writing to the old table
5. COPY over the rows that have changed
6. Rename the old table to something else (for safety)
7. Rename the new table to the real name
8. Drop the old table when all is good

- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201001071253
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAktGIC0ACgkQvJuQZxSWSshEAQCfRT3PsQyWCOBXGW1XRAB814df
pJUAoMuAJoOKho39opoHq/d1J9NprGlH
=htaE
-----END PGP SIGNATURE-----



Re: Massive table (500M rows) update nightmare

From
"Carlo Stonebanks"
Date:
> Got an explain analyze of the delete query?

UPDATE mdx_core.audit_impt
SET source_table = 'mdx_import.'||impt_name
WHERE audit_impt_id >= 319400001 AND audit_impt_id <= 319400010
AND coalesce(source_table, '') = ''

Index Scan using audit_impt_pkey on audit_impt  (cost=0.00..92.63 rows=1
width=608) (actual time=0.081..0.244 rows=10 loops=1)
  Index Cond: ((audit_impt_id >= 319400001) AND (audit_impt_id <=
319400010))
  Filter: ((COALESCE(source_table, ''::character varying))::text = ''::text)
Total runtime: 372.141 ms

Hard to tell how reliable these numbers are, because the caches are likely
spun up for the WHERE clause - in particular, SELECT queries have been run
to test whether the rows actually qualify for the update.

The coalesce may be slowing things down slightly, but is a necessary evil.


Re: Massive table (500M rows) update nightmare

From
"Carlo Stonebanks"
Date:
> If it is possible to lock this audit table exclusively (may be during
> off peak hours) I would look into
> - create new_audit_table as select col1, col2, col3 ... col9,
> 'new_col_value' from old_audit_table;
> - create all indexes
> - drop old_audit_table
> - rename new_audit_table to old_audit_table
>
> That is probably the fasted method you can do, even if you have to join
> the "new_col_value" from an extra helper-table with the correspondig id.
> Remeber, databases are born to join.
>

This has all been done before - the production team was crippled while they
waited for this and the SECOND the table was available again, they jumped on
it - even though it meant recreating the bare minimum of the indexes.

> You could also try to just update the whole table in one go, it is
> probably faster than you expect.

Possibly, but with such a large table you have no idea of the progress, you
cannot interrupt it without rolling back everything. Worse, you have
applications stalling and users wanting to know what is going on - is the OS
and the DB/MVCC trashing while it does internal maintenance? Have you
reached some sort of deadlock condition that you can't see because the
server status is not helpful with so many uncommitted pending updates?

And of course, there is the file bloat.


Re: Massive table (500M rows) update nightmare

From
marcin mank
Date:
> every update is a UPDATE ... WHERE id
>>= x AND id < x+10 and a commit is performed after every 1000 updates
> statement, i.e. every 10000 rows.

What is the rationale behind this? How about doing 10k rows in 1
update, and committing every time?

You could try making the condition on the ctid column, to not have to
use the index on ID, and process the rows in physical order. First
make sure that newly inserted production data has the correct value in
the new column, and add 'where new_column is null' to the conditions.
But I have never tried this, use at Your own risk.

Greetings
Marcin Mank

Re: Massive table (500M rows) update nightmare

From
"Carlo Stonebanks"
Date:
> What is the rationale behind this? How about doing 10k rows in 1
> update, and committing every time?

When we did 10K updates, the application would sometimes appear to have
frozen, and we were concerned that there was a deadlock condition because of
the number of locked rows. While we may have the patience to sit around and
wait five minutes to see if the update would continue, we couldn't risk
having other applications appear frozen if that was the case. In fact, there
is no reason for any user or application to write to the same records we are
writing to - but the audit table is read frequently. We are not explicitly
locking anything, or writing any additional code to arbitrate the lcoking
model anywhere -it's all default UPDATE and SELECT syntax.

Doing the updates in smaller chunks resolved these apparent freezes - or,
more specifically, when the application DID freeze, it didn't do it for more
than 30 seconds. In all likelyhood, this is the OS and the DB thrashing.

We have since modified the updates to process 1000 rows at a time with a
commit every 10 pages. Just this morning, though, the IS manager asked me to
stop the backfill because of the load affect on other processes.

> You could try making the condition on the ctid column, to not have to
> use the index on ID, and process the rows in physical order.

An interesting idea, if I can confirm that the performance problem is
because of the WHERE clause, not the UPDATE.

>'where new_column is null' to the conditions.

Already being done, albeit with a coalesce(val, '') = '' - it's quite
possible that this is hurting the WHERE clause; the EXPLAIN shows the table
using the pkey and then filtering on the COALESCE as one would expect.

Carlo


Re: Massive table (500M rows) update nightmare

From
"Kevin Grittner"
Date:
"Carlo Stonebanks" <stonec.register@sympatico.ca> wrote:

> An interesting idea, if I can confirm that the performance problem
> is because of the WHERE clause, not the UPDATE.

If you could show EXPLAIN ANALYZE output for one iteration, with
related queries and maybe more info on the environment, it would
take most of the guesswork out of things.

-Kevin

Re: Massive table (500M rows) update nightmare

From
Scott Marlowe
Date:
On Thu, Jan 7, 2010 at 2:48 PM, Carlo Stonebanks
<stonec.register@sympatico.ca> wrote:
> Doing the updates in smaller chunks resolved these apparent freezes - or,
> more specifically, when the application DID freeze, it didn't do it for more
> than 30 seconds. In all likelyhood, this is the OS and the DB thrashing.

It might well be checkpoints.  Have you tried cranking up checkpoint
segments to something like 100 or more and seeing how it behaves then?

Re: Massive table (500M rows) update nightmare

From
"Carlo Stonebanks"
Date:
> It might well be checkpoints.  Have you tried cranking up checkpoint
> segments to something like 100 or more and seeing how it behaves then?

No I haven't, althugh it certainly make sense - watching the process run,
you get this sense that the system occaisionally pauses to take a deep, long
breath before returning to work frantically ;D

Checkpoint_segments are currently set to 64. The DB is large and is on a
constant state of receiving single-row updates as multiple ETL and
refinement processes run continuously.

Would you expect going to 100 or more to make an appreciable difference, or
should I be more aggressive?
>


Re: Massive table (500M rows) update nightmare

From
"Carlo Stonebanks"
Date:
Already done in an earlier post, Kevin - I have included it again below. As
you can see, it's pretty well wqhat you would expect, index scan plus a
filter.

One note: updates where no rows qualify run appreciably faster than the ones
that do. That is, the update itself appears to be consuming a good deal of
the processing time. This may be due to the 6 indexes.

UPDATE mdx_core.audit_impt
SET source_table = 'mdx_import.'||impt_name
WHERE audit_impt_id >= 319400001 AND audit_impt_id <= 319400010
AND coalesce(source_table, '') = ''

Index Scan using audit_impt_pkey on audit_impt  (cost=0.00..92.63 rows=1
width=608) (actual time=0.081..0.244 rows=10 loops=1)
  Index Cond: ((audit_impt_id >= 319400001) AND (audit_impt_id <=
319400010))
  Filter: ((COALESCE(source_table, ''::character varying))::text = ''::text)
Total runtime: 372.141 ms




""Kevin Grittner"" <Kevin.Grittner@wicourts.gov> wrote in message
news:4B462563020000250002DFA3@gw.wicourts.gov...
> "Carlo Stonebanks" <stonec.register@sympatico.ca> wrote:
>
>> An interesting idea, if I can confirm that the performance problem
>> is because of the WHERE clause, not the UPDATE.
>
> If you could show EXPLAIN ANALYZE output for one iteration, with
> related queries and maybe more info on the environment, it would
> take most of the guesswork out of things.
>
> -Kevin
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: Massive table (500M rows) update nightmare

From
Scott Marlowe
Date:
On Thu, Jan 7, 2010 at 11:14 PM, Carlo Stonebanks
<stonec.register@sympatico.ca> wrote:
>> It might well be checkpoints.  Have you tried cranking up checkpoint
>> segments to something like 100 or more and seeing how it behaves then?
>
> No I haven't, althugh it certainly make sense - watching the process run,
> you get this sense that the system occaisionally pauses to take a deep, long
> breath before returning to work frantically ;D
>
> Checkpoint_segments are currently set to 64. The DB is large and is on a
> constant state of receiving single-row updates as multiple ETL and
> refinement processes run continuously.
>
> Would you expect going to 100 or more to make an appreciable difference, or
> should I be more aggressive?

If you're already at 64 then not much.  Probably wouldn't hurt to
crank it up more and delay the checkpoints as much as possible during
these updates.  64 segments is already 1024M.  If you're changing a
lot more data than that in a single update / insert then cranking them
up more might help.

 What you might need to do is to change your completion target to
something closer to 100% since it's likely that most of the updates /
inserts are not happening to the same rows over and over, but to
different rows for each one, the closer you can get to 100% completed
before the next checkpoint the better.  This will cause some more IO
to happen, but will even it out more (hopefully) so that you don't get
checkpoint spikes.  Look into the checkpoint logging options so you
can monitor how they're affecting system performance.

Re: Massive table (500M rows) update nightmare

From
Eduardo Morras
Date:
At 08:17 07/01/2010, Carlo Stonebanks wrote:
>Our DB has an audit table which is 500M rows and growing. (FYI the objects being audited are grouped semantically, not
individualfield values). 
>
>Recently we wanted to add a new feature and we altered the table to add a new column. We are backfilling this
varchar(255)column by writing a TCL script to page through the rows (where every update is a UPDATE ... WHERE id >= x
ANDid < x+10 and a commit is performed after every 1000 updates statement, i.e. every 10000 rows.) 
>
>We have 10 columns, six of which are indexed. Rough calculations suggest that this will take two to three weeks to
completeon an 8-core CPU with more than enough memory. 
>
>As a ballpark estimate - is this sort of performance for an 500M updates what one would expect of PG given the table
structure(detailed below) or should I dig deeper to look for performance issues? 
>
>As always, thanks!
>
>Carlo

You can dump the table with pg_dumpand get a file like this (only a few columns, not all of them). Note that the last
line,has the data in TSV (Tab Separate Format) plus a LF. 

SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = off;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET escape_string_warning = off;

SET search_path = public, pg_catalog;

SET default_tablespace = '';

SET default_with_oids = false;

CREATE TABLE mdx_core.audit_impt
(
 audit_impt_id serial NOT NULL,
 impt_session integer,
 impt_version character varying(255),
}

COPY mdx_core.audit_impt (audit_impt_id, impt_session, impt_version) FROM stdin;
1       1       tateti
2       32      coll

You can add a new column hacking it, just adding the new column to the schema, the name in the copy statement and a
tabulator+dataat the end of the line (before the LF). Note that the table name is different from the original. 

SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = off;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET escape_string_warning = off;
SET search_path = public, pg_catalog;
SET default_tablespace = '';
SET default_with_oids = false;
CREATE TABLE mdx_core.audit_impt2
(
 audit_impt_id serial NOT NULL,
 impt_session integer,
 impt_version character varying(255),
 source_table character varying(255)
}
COPY mdx_core.audit_impt2 (audit_impt_id, impt_session, impt_version, source_table) FROM stdin;
1       1       tateti  tentown
1       32      coll    krof


After this, add indexes, constraints as usual.

HTH

--------------------------------
Eduardo Morrás González
Dept. I+D+i e-Crime Vigilancia Digital
S21sec Labs
Tlf: +34 902 222 521
Móvil: +34 555 555 555
www.s21sec.com, blog.s21sec.com


Salvo que se indique lo contrario, esta información es CONFIDENCIAL y
contiene datos de carácter personal que han de ser tratados conforme a la
legislación vigente en materia de protección de datos. Si usted no es
destinatario original de este mensaje, le comunicamos que no está autorizado
a revisar, reenviar, distribuir, copiar o imprimir la información en él
contenida y le rogamos que proceda a borrarlo de sus sistemas.

Kontrakoa adierazi ezean, posta elektroniko honen barruan doana ISILPEKO
informazioa da eta izaera pertsonaleko datuak dituenez, indarrean dagoen
datu pertsonalak babesteko legediaren arabera tratatu beharrekoa. Posta
honen hartzaile ez zaren kasuan, jakinarazten dizugu baimenik ez duzula
bertan dagoen informazioa aztertu, igorri, banatu, kopiatu edo inprimatzeko.
Hortaz, erregutzen dizugu posta hau zure sistemetatik berehala ezabatzea.

Antes de imprimir este mensaje valora si verdaderamente es necesario. De
esta forma contribuimos a la preservación del Medio Ambiente.


Re: Massive table (500M rows) update nightmare

From
"Kevin Grittner"
Date:
"Carlo Stonebanks" <stonec.register@sympatico.ca> wrote:

> Already done in an earlier post

Perhaps I misunderstood; I thought that post mentioned that the plan
was one statement in an iteration, and that the cache would have
been primed by a previous query checking whether there were any rows
to update.  If that was the case, it might be worthwhile to look at
the entire flow of an iteration.

Also, if you ever responded with version and configuration
information, I missed it.  The solution to parts of what you
describe would be different in different versions.  In particular,
you might be able to solve checkpoint-related lockup issues and then
improve performance by using bigger batches.  Right now I would be
guessing at what might work for you.

-Kevin

Re: Massive table (500M rows) update nightmare

From
"Carlo Stonebanks"
Date:
> crank it up more and delay the checkpoints as much as possible during
> these updates.  64 segments is already 1024M.

We have 425M rows, total table size is 78GB, so we can imagine a worst case
UPDATE write is less than 200 bytes * number of rows specified in the update
(is that logic correct?).

Inerestingly, the total index size is 148GB, twice that of the table, which
may be an indication of where the performance bottleneck is.


Re: Massive table (500M rows) update nightmare

From
Andreas Kretschmer
Date:
Carlo Stonebanks <stonec.register@sympatico.ca> wrote:

> Inerestingly, the total index size is 148GB, twice that of the table,
> which may be an indication of where the performance bottleneck is.

Maybe a sign for massive index-bloat?


Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

Re: Massive table (500M rows) update nightmare

From
"Carlo Stonebanks"
Date:
>I thought that post mentioned that the plan
> was one statement in an iteration, and that the cache would have
> been primed by a previous query checking whether there were any rows
> to update.  If that was the case, it might be worthwhile to look at
> the entire flow of an iteration.

This is the only SQL query in the code in question - the rest of the code
manages the looping and commit. The code was copied to PgAdminIII and values
written in for the WHERE clause. In order for me to validate that rows would
have been updated, I had to run a SELECT with the same WHERE clause in
PgAdminIII first to see how many rows would have qualified. But this was for
testing purposes only. The SELECT statement does not exist in the code. The
vast majority of the rows that will be processed will be updated as this is
a backfill to synch the old rows with the values being filled into new
columns now being inserted.

> Also, if you ever responded with version and configuration
> information, I missed it.

This is hosted on a new server the client set up so I am waiting for the
exact OS and hardware config. PG Version is PostgreSQL 8.3.6, compiled by
Visual C++ build 1400, OS appears to be Windows 2003 x64 Server.

More than anything, I am more concerned with the long-term use of the
system. This particular challenge with the 500M row update is one thing, but
I am concerned about the exceptional effort required to do this. Is it
REALLY this exceptional to want to update 500M rows of data in this day and
age? Or is the fact that we are considering dumping and restoring and
dropping indexes, etc to do all an early warning that we don't have a
solution that is scaled to the problem?

Config data follows (I am assuming commented values which I did not include
are defaulted).

Carlo

autovacuum = on
autovacuum_analyze_scale_factor = 0.1
autovacuum_analyze_threshold = 250
autovacuum_naptime = 1min
autovacuum_vacuum_scale_factor = 0.2
autovacuum_vacuum_threshold = 500
bgwriter_lru_maxpages = 100
checkpoint_segments = 64
checkpoint_warning = 290
datestyle = 'iso, mdy'
default_text_search_config = 'pg_catalog.english'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
log_destination = 'stderr'
log_line_prefix = '%t '
logging_collector = on
maintenance_work_mem = 16MB
max_connections = 200
max_fsm_pages = 204800
max_locks_per_transaction = 128
port = 5432
shared_buffers = 500MB
vacuum_cost_delay = 100
work_mem = 512MB


Re: Massive table (500M rows) update nightmare

From
Kenneth Marshall
Date:
On Fri, Jan 08, 2010 at 12:38:46PM -0500, Carlo Stonebanks wrote:
>> I thought that post mentioned that the plan
>> was one statement in an iteration, and that the cache would have
>> been primed by a previous query checking whether there were any rows
>> to update.  If that was the case, it might be worthwhile to look at
>> the entire flow of an iteration.
>
> This is the only SQL query in the code in question - the rest of the code
> manages the looping and commit. The code was copied to PgAdminIII and
> values written in for the WHERE clause. In order for me to validate that
> rows would have been updated, I had to run a SELECT with the same WHERE
> clause in PgAdminIII first to see how many rows would have qualified. But
> this was for testing purposes only. The SELECT statement does not exist in
> the code. The vast majority of the rows that will be processed will be
> updated as this is a backfill to synch the old rows with the values being
> filled into new columns now being inserted.
>
>> Also, if you ever responded with version and configuration
>> information, I missed it.
>
> This is hosted on a new server the client set up so I am waiting for the
> exact OS and hardware config. PG Version is PostgreSQL 8.3.6, compiled by
> Visual C++ build 1400, OS appears to be Windows 2003 x64 Server.
>
> More than anything, I am more concerned with the long-term use of the
> system. This particular challenge with the 500M row update is one thing,
> but I am concerned about the exceptional effort required to do this. Is it
> REALLY this exceptional to want to update 500M rows of data in this day and
> age? Or is the fact that we are considering dumping and restoring and
> dropping indexes, etc to do all an early warning that we don't have a
> solution that is scaled to the problem?
>
> Config data follows (I am assuming commented values which I did not include
> are defaulted).
>
> Carlo
>

Hi Carlo,

It all boils down to resource management and constraints. For small
problems relative to the amount of system resources available, little
effort is needed to have satisfactory performance. As the problems
consume more and more of the total resource capacity, you will need
to know more and more in depth about the workings of every piece of
the system to wring the most possible performance out of the system.
Some of the discussions on this topic have covered a smattering of
details that will become increasingly important as your system scales
and determine whether or not it will scale. Many times the need for
updates on such a massive scale do point to normalization problems.

My two cents.

Cheers,
Ken

Re: Massive table (500M rows) update nightmare

From
Greg Smith
Date:
Carlo Stonebanks wrote:
> This is hosted on a new server the client set up so I am waiting for
> the exact OS and hardware config. PG Version is PostgreSQL 8.3.6,
> compiled by Visual C++ build 1400, OS appears to be Windows 2003 x64
> Server.
>
> More than anything, I am more concerned with the long-term use of the
> system. This particular challenge with the 500M row update is one
> thing, but I am concerned about the exceptional effort required to do
> this. Is it REALLY this exceptional to want to update 500M rows of
> data in this day and age? Or is the fact that we are considering
> dumping and restoring and dropping indexes, etc to do all an early
> warning that we don't have a solution that is scaled to the problem?

It's certainly not common or easy to handle.  If someone told me I had
to make that task well perform well and the tools at hand were anything
other than a largish UNIX-ish server with a properly designed disk
subsystem, I'd tell them it's unlikely to work well.  An UPDATE is the
most intensive single operation you can do in PostgreSQL; the full
lifecycle of executing it requires:

-Creating a whole new row
-Updating all the indexes to point to the new row
-Marking the original row dead
-Pruning the original row out of the database once it's no longer
visible anywhere (VACUUM)

As a rule, if you can't fit a large chunk of the indexes involved in
shared_buffers on your system, this is probably going to have terrible
performance.  And you're on a platform where that's very hard to do,
even if there's a lot of RAM around.  It sounds like your system spends
all its time swapping index blocks in and out of the database buffer
cache here.  That suggests there's a design problem here either with
those indexes (they're too big) or with the provisioned
hardware/software combination (needs more RAM, faster disks, or a
platform where large amounts of RAM work better).

--
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com


Re: Massive table (500M rows) update nightmare

From
"Kevin Grittner"
Date:
"Carlo Stonebanks" <stonec.register@sympatico.ca> wrote:

> In order for me to validate that rows would have been updated, I
> had to run a SELECT with the same WHERE clause in PgAdminIII first
> to see how many rows would have qualified. But this was for
> testing purposes only.  The SELECT statement does not exist in the
> code.

OK, I did misunderstand your earlier post.  Got it now, I think.

> This is hosted on a new server the client set up so I am waiting
> for the exact OS and hardware config. PG Version is PostgreSQL
> 8.3.6, compiled by Visual C++ build 1400, OS appears to be Windows
> 2003 x64 Server.

That might provide more clues, when you get it.

> bgwriter_lru_maxpages = 100

With the large database size and the apparent checkpoint-related
delays, I would make that more aggressive.  Getting dirty pages to
the OS cache reduces how much physical I/O needs to happen during
checkpoint.  We use this on our large databases:

bgwriter_lru_maxpages = 1000
bgwriter_lru_multiplier = 4.0

Boosting your checkpoint_completion_target along with or instead of
a more aggressive background writer might also help.

> max_fsm_pages = 204800

This looks suspiciously low for the size of your database.  If you
do a VACUUM VERBOSE (for the database), what do the last few lines
show?

> work_mem = 512MB

That's OK only if you are sure you don't have a lot of connections
requesting that much RAM at one time, or you could drive yourself
into swapping.

I hope this helps.

-Kevin

Re: Massive table (500M rows) update nightmare

From
Pierre Frédéric Caillaud
Date:
>> crank it up more and delay the checkpoints as much as possible during
>> these updates.  64 segments is already 1024M.
>
> We have 425M rows, total table size is 78GB, so we can imagine a worst
> case UPDATE write is less than 200 bytes * number of rows specified in
> the update (is that logic correct?).

There is also the WAL : all these updates need to be logged, which doubles
the UPDATE write throughput. Perhaps you're WAL-bound (every 16MB segment
needs fsyncing), and tuning of fsync= and wal_buffers, or a faster WAL
disk could help ? (I don't remember your config).

> Inerestingly, the total index size is 148GB, twice that of the table,
> which may be an indication of where the performance bottleneck is.

Index updates can create random I/O (suppose you have a btree on a rather
random column)...


New server to improve performance on our large and busy DB - advice?

From
"Carlo Stonebanks"
Date:
My client just informed me that new hardware is available for our DB server.

. Intel Core 2 Quads Quad
. 48 GB RAM
. 4 Disk RAID drive (RAID level TBD)

I have put the ugly details of what we do with our DB below, as well as the
postgres.conf settings. But, to summarize: we have a PostgreSQL 8.3.6 DB
with very large tables and the server is always busy serving a constant
stream of single-row UPDATEs and INSERTs from parallel automated processes.

There are less than 10 users, as the server is devoted to the KB production
system.

My questions:

1) Which RAID level would you recommend
2) Which Windows OS would you recommend? (currently 2008 x64 Server)
3) If we were to port to a *NIX flavour, which would you recommend? (which
support trouble-free PG builds/makes please!)
4) Is this the right PG version for our needs?

Thanks,

Carlo

The details of our use:

. The DB hosts is a data warehouse and a knowledgebase (KB) tracking the
professional information of 1.3M individuals.
. The KB tables related to these 130M individuals are naturally also large
. The DB is in a perpetual state of serving TCL-scripted Extract, Transform
and Load (ETL) processes
. These ETL processes typically run 10 at-a-time (i.e. in parallel)
. We would like to run more, but the server appears to be the bottleneck
. The ETL write processes are 99% single row UPDATEs or INSERTs.
. There are few, if any DELETEs
. The ETL source data are "import tables"
. The import tables are permanently kept in the data warehouse so that we
can trace the original source of any information.
. There are 6000+ and counting
. The import tables number from dozens to hundreds of thousands of rows.
They rarely require more than a pkey index.
. Linking the KB to the source import date requires an "audit table" of 500M
rows, and counting.
. The size of the audit table makes it very difficult to manage, especially
if we need to modify the design.
. Because we query the audit table different ways to audit the ETL processes
decisions, almost every column in the audit table is indexed.
. The maximum number of physical users is 10 and these users RARELY perform
any kind of write
. By contrast, the 10+ ETL processes are writing constantly
. We find that internal stats drift, for whatever reason, causing row seq
scans instead of index scans.
. So far, we have never seen a situation where a seq scan has improved
performance, which I would attribute to the size of the tables
. We believe our requirements are exceptional, and we would benefit
immensely from setting up the PG planner to always favour index-oriented
decisions - which seems to contradict everything that PG advice suggests as
best practice.

Current non-default conf settings are:

autovacuum = on
autovacuum_analyze_scale_factor = 0.1
autovacuum_analyze_threshold = 250
autovacuum_naptime = 1min
autovacuum_vacuum_scale_factor = 0.2
autovacuum_vacuum_threshold = 500
bgwriter_lru_maxpages = 100
checkpoint_segments = 64
checkpoint_warning = 290
datestyle = 'iso, mdy'
default_text_search_config = 'pg_catalog.english'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
log_destination = 'stderr'
log_line_prefix = '%t '
logging_collector = on
maintenance_work_mem = 16MB
max_connections = 200
max_fsm_pages = 204800
max_locks_per_transaction = 128
port = 5432
shared_buffers = 500MB
vacuum_cost_delay = 100
work_mem = 512MB



Re: Massive table (500M rows) update nightmare

From
"Carlo Stonebanks"
Date:
Guys, I want to thank you for all of the advice - my client has just made a
surprise announcement that he would like to set start from scratch with a
new server, so I am afraid that all of this great advice has to be seen in
the context of whatever decision is made on that. I am out there,
hat-in-hand, looking for advice under the PERFORM post: "New server to
improve performance on our large and  busy DB - advice?"

Thanks again!

Carlo


"Scott Marlowe" <scott.marlowe@gmail.com> wrote in message
news:dcc563d11001071740q572cdae2re410788fe790dbf6@mail.gmail.com...
> On Thu, Jan 7, 2010 at 2:48 PM, Carlo Stonebanks
> <stonec.register@sympatico.ca> wrote:
>> Doing the updates in smaller chunks resolved these apparent freezes - or,
>> more specifically, when the application DID freeze, it didn't do it for
>> more
>> than 30 seconds. In all likelyhood, this is the OS and the DB thrashing.
>
> It might well be checkpoints.  Have you tried cranking up checkpoint
> segments to something like 100 or more and seeing how it behaves then?
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: New server to improve performance on our large and busy DB -advice?

From
"Joshua D. Drake"
Date:
On Thu, 14 Jan 2010 14:17:13 -0500, "Carlo Stonebanks"
<stonec.register@sympatico.ca> wrote:
> My client just informed me that new hardware is available for our DB
> server.
>
> . Intel Core 2 Quads Quad
> . 48 GB RAM
> . 4 Disk RAID drive (RAID level TBD)
>
> I have put the ugly details of what we do with our DB below, as well as
> the
> postgres.conf settings. But, to summarize: we have a PostgreSQL 8.3.6 DB

> with very large tables and the server is always busy serving a constant
> stream of single-row UPDATEs and INSERTs from parallel automated
processes.
>
> There are less than 10 users, as the server is devoted to the KB
> production
> system.
>
> My questions:
>
> 1) Which RAID level would you recommend

10

> 2) Which Windows OS would you recommend? (currently 2008 x64 Server)

If you have to run Windows... that works.

> 3) If we were to port to a *NIX flavour, which would you recommend?
(which
> support trouble-free PG builds/makes please!)

Community driven:
Debian Stable
CentOS 5

Commercial:
Ubuntu LTS
RHEL 5

> 4) Is this the right PG version for our needs?

You want to run at least the latest stable 8.3 series which I believe is
8.3.9.
With the imminent release of 8.5 (6 months), it may be time to move to
8.4.2 instead.


Joshua D. Drake


>
> Thanks,
>
> Carlo
>
> The details of our use:
>
> . The DB hosts is a data warehouse and a knowledgebase (KB) tracking the

> professional information of 1.3M individuals.
> . The KB tables related to these 130M individuals are naturally also
large
> . The DB is in a perpetual state of serving TCL-scripted Extract,
> Transform
> and Load (ETL) processes
> . These ETL processes typically run 10 at-a-time (i.e. in parallel)
> . We would like to run more, but the server appears to be the bottleneck
> . The ETL write processes are 99% single row UPDATEs or INSERTs.
> . There are few, if any DELETEs
> . The ETL source data are "import tables"
> . The import tables are permanently kept in the data warehouse so that
we
> can trace the original source of any information.
> . There are 6000+ and counting
> . The import tables number from dozens to hundreds of thousands of rows.

> They rarely require more than a pkey index.
> . Linking the KB to the source import date requires an "audit table" of
> 500M
> rows, and counting.
> . The size of the audit table makes it very difficult to manage,
> especially
> if we need to modify the design.
> . Because we query the audit table different ways to audit the ETL
> processes
> decisions, almost every column in the audit table is indexed.
> . The maximum number of physical users is 10 and these users RARELY
> perform
> any kind of write
> . By contrast, the 10+ ETL processes are writing constantly
> . We find that internal stats drift, for whatever reason, causing row
seq
> scans instead of index scans.
> . So far, we have never seen a situation where a seq scan has improved
> performance, which I would attribute to the size of the tables
> . We believe our requirements are exceptional, and we would benefit
> immensely from setting up the PG planner to always favour index-oriented

> decisions - which seems to contradict everything that PG advice suggests
> as
> best practice.
>
> Current non-default conf settings are:
>
> autovacuum = on
> autovacuum_analyze_scale_factor = 0.1
> autovacuum_analyze_threshold = 250
> autovacuum_naptime = 1min
> autovacuum_vacuum_scale_factor = 0.2
> autovacuum_vacuum_threshold = 500
> bgwriter_lru_maxpages = 100
> checkpoint_segments = 64
> checkpoint_warning = 290
> datestyle = 'iso, mdy'
> default_text_search_config = 'pg_catalog.english'
> lc_messages = 'C'
> lc_monetary = 'C'
> lc_numeric = 'C'
> lc_time = 'C'
> log_destination = 'stderr'
> log_line_prefix = '%t '
> logging_collector = on
> maintenance_work_mem = 16MB
> max_connections = 200
> max_fsm_pages = 204800
> max_locks_per_transaction = 128
> port = 5432
> shared_buffers = 500MB
> vacuum_cost_delay = 100
> work_mem = 512MB

--
PostgreSQL - XMPP: jdrake(at)jabber(dot)postgresql(dot)org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997

Re: Massive table (500M rows) update nightmare

From
Craig James
Date:
Carlo Stonebanks wrote:
> Guys, I want to thank you for all of the advice - my client has just
> made a surprise announcement that he would like to set start from
> scratch with a new server, so I am afraid that all of this great advice
> has to be seen in the context of whatever decision is made on that. I am
> out there, hat-in-hand, looking for advice under the PERFORM post: "New
> server to improve performance on our large and  busy DB - advice?"

You might start this as a new topic with a relevant title, and reiterate your database requirements.  Otherwise it will
getsubmerged as just a footnote to your original question.  It's really nice to be able to quickly find the
new-equipmentdiscussions. 

Craig

Re: New server to improve performance on our large and busy DB - advice?

From
Scott Marlowe
Date:
On Thu, Jan 14, 2010 at 12:17 PM, Carlo Stonebanks
<stonec.register@sympatico.ca> wrote:
> My client just informed me that new hardware is available for our DB server.
>
> . Intel Core 2 Quads Quad
> . 48 GB RAM
> . 4 Disk RAID drive (RAID level TBD)
>
> I have put the ugly details of what we do with our DB below, as well as the
> postgres.conf settings. But, to summarize: we have a PostgreSQL 8.3.6 DB
> with very large tables and the server is always busy serving a constant
> stream of single-row UPDATEs and INSERTs from parallel automated processes.
>
> There are less than 10 users, as the server is devoted to the KB production
> system.
>
> My questions:
>
> 1) Which RAID level would you recommend

RAID-10 with a battery backed hardware caching controller.

> 2) Which Windows OS would you recommend? (currently 2008 x64 Server)

That's probably the most stable choice out there for Windows.

> 3) If we were to port to a *NIX flavour, which would you recommend? (which
> support trouble-free PG builds/makes please!)

I'd parrot what Joshua Drake said here.  Centos / RHEL / Debian / Ubuntu

> 4) Is this the right PG version for our needs?

8.3 is very stable.  Update to the latest.  8.4 seems good, but I've
had, and still am having, problems with it crashing in production.
Not often, maybe once every couple of months, but just enough that I'm
not ready to try and use it there yet.  And I can't force the same
failure in testing, at least not yet.

> The details of our use:
>
> . These ETL processes typically run 10 at-a-time (i.e. in parallel)
> . We would like to run more, but the server appears to be the bottleneck
> . The ETL write processes are 99% single row UPDATEs or INSERTs.

Can you run the ETL processes in such a way that they can do many
inserts and updates at once?  That would certainly speed things up a
bit.

> . The size of the audit table makes it very difficult to manage, especially
> if we need to modify the design.

You might want to look into partitioning / inheritance if that would help.

> . Because we query the audit table different ways to audit the ETL processes
> decisions, almost every column in the audit table is indexed.

This may or may not help.  If you're querying it and the part in the
where clause referencing this column isn't very selective, and index
won't be chosen anyway.  If you've got multiple columns in your where
clause, the more selective ones will use and index and the rest will
get filtered out instead of using an index.  Look in
pg_stat_user_indexes for indexes that don't get used and drop them
unless, of course, they're unique indexes.

> . The maximum number of physical users is 10 and these users RARELY perform
> any kind of write
> . By contrast, the 10+ ETL processes are writing constantly

You may be well served by having two servers, one to write to, and a
slave that is used by the actual users.  Our slony slaves have a much
easier time writing out their data than our master database does.

> . We find that internal stats drift, for whatever reason, causing row seq
> scans instead of index scans.

Yeah, this is a known problem on heavily updated tables and recent
entries.  Cranking up autovacuum a bit can help, but often it requires
special treatment, either by adjusting the autovac analyze threshold
values for just those tables, or running manual analyzes every couple
of minutes.

> . So far, we have never seen a situation where a seq scan has improved
> performance, which I would attribute to the size of the tables

Not so much the size of the tables, as the size of the request.  If
you were running aggregates across whole large tables, a seq scan
would definitely be the way to go.  If you're asking for one row,
index scan should win.  Somewhere between those two, when you get up
to hitting some decent percentage of the rows, the switch from index
scan to seq scan makes sense, and it's likely happening too early for
you.  Look at random_page_cost and effective_cache_size for starters.

> . We believe our requirements are exceptional, and we would benefit
> immensely from setting up the PG planner to always favour index-oriented
> decisions - which seems to contradict everything that PG advice suggests as
> best practice.

See previous comment I made up there ^^^  It's not about always using
indexes, it's about giving the planner the information it needs to
make the right choice.

> Current non-default conf settings are:
>
> autovacuum = on
> autovacuum_analyze_scale_factor = 0.1

You might wanna lower the analyze scale factor if you're having
problems with bad query plans on fresh data.

> autovacuum_analyze_threshold = 250
> autovacuum_naptime = 1min
> autovacuum_vacuum_scale_factor = 0.2
> autovacuum_vacuum_threshold = 500
> bgwriter_lru_maxpages = 100
> checkpoint_segments = 64
> checkpoint_warning = 290
> datestyle = 'iso, mdy'
> default_text_search_config = 'pg_catalog.english'
> lc_messages = 'C'
> lc_monetary = 'C'
> lc_numeric = 'C'
> lc_time = 'C'
> log_destination = 'stderr'
> log_line_prefix = '%t '
> logging_collector = on
> maintenance_work_mem = 16MB
> max_connections = 200
> max_fsm_pages = 204800

The default tends to be low.  Run vacuum verbose to see if you're
overrunning the max_fsm_pages settings or the max_fsm_relations.

> max_locks_per_transaction = 128
> port = 5432
> shared_buffers = 500MB
> vacuum_cost_delay = 100

That's REALLY REALLY high.  You might want to look at something in the
5 to 20 range.

> work_mem = 512MB

Re: Massive table (500M rows) update nightmare

From
"Carlo Stonebanks"
Date:
My bad - I thought I had, so it has been re-posted with a (v2) disclaimer in
the title... like THAT will stop the flaming! <g>

Thanks for your patience!

"Craig James" <craig_james@emolecules.com> wrote in message
news:4B4F8A49.7010906@emolecules.com...
> Carlo Stonebanks wrote:
>> Guys, I want to thank you for all of the advice - my client has just made
>> a surprise announcement that he would like to set start from scratch with
>> a new server, so I am afraid that all of this great advice has to be seen
>> in the context of whatever decision is made on that. I am out there,
>> hat-in-hand, looking for advice under the PERFORM post: "New server to
>> improve performance on our large and  busy DB - advice?"
>
> You might start this as a new topic with a relevant title, and reiterate
> your database requirements.  Otherwise it will get submerged as just a
> footnote to your original question.  It's really nice to be able to
> quickly find the new-equipment discussions.
>
> Craig
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: New server to improve performance on our large and busy DB - advice?

From
Greg Smith
Date:
Carlo Stonebanks wrote:
> 1) Which RAID level would you recommend

It looks like you stepped over a critical step, which is "will the
server have a good performing RAID card?".  Your whole upgrade could
underperform if you make a bad mistake on that part.  It's really
important to nail that down, and to benchmark to prove you got what you
expected from your hardware vendor.

> 3) If we were to port to a *NIX flavour, which would you recommend?
> (which support trouble-free PG builds/makes please!)

The only platform I consider close to trouble free as far as the PG
builds working without issues are RHEL/CentOS, due to the maturity of
the PGDG yum repository and how up to date it's kept.  Every time I
wander onto another platform I find the lag and care taken in packaging
PostgreSQL to be at least a small step down from there.

> 4) Is this the right PG version for our needs?

8.4 removes the FSM, which takes away a common source for unexpected
performance issues when you overflow max_fsm_pages one day.  If you're
going to deploy 8.3, you need to be more careful to monitor the whole
VACUUM process; it's easier to ignore in 8.4 and still get by OK.  As
far as general code stability goes, I think it's a wash at this point.
You might discover a bug in 8.4 that causes a regression, but I think
you're just as likely to run into a situation that 8.3 handles badly
that's improved in 8.4.  Hard to say which will work out better in a
really general way.

> . We believe our requirements are exceptional, and we would benefit
> immensely from setting up the PG planner to always favour
> index-oriented decisions - which seems to contradict everything that
> PG advice suggests as best practice.

Pretty much everyone thinks their requirements are exceptional.  It's
funny how infrequently that's actually true.  The techniques that favor
index-use aren't that unique:  collect better stats, set basic
parameters correctly, adjust random_page_cost, investigate plans that
don't do what you want to figure out why.  It's easy to say there's
something special about your data rather than follow fundamentals here;
I'd urge you to avoid doing that.  The odds that the real issue is that
you're feeding the optimizer bad data is more likely than most people
think, which brings us to:

> Current non-default conf settings are:

I don't see effective_cache_size listed there.  If that's at the
default, I wouldn't be surprised that you're seeing sequential scans
instead of indexed ones far too often.


> max_connections = 200
> work_mem = 512MB

This is a frightening combination by the way.

--
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com


Re: New server to improve performance on our large and busy DB - advice?

From
"Carlo Stonebanks"
Date:
> Pretty much everyone thinks their requirements are exceptional.  It's
> funny how infrequently that's actually true.  The techniques that favor
> index-use aren't that unique:  collect better stats, set basic parameters
> correctly, adjust random_page_cost, investigate plans that don't do what
> you want to figure out why.  It's easy to say there's something special
> about your data rather than follow fundamentals here; I'd urge you to
> avoid doing that.  The odds that the real issue is that you're feeding the
> optimizer bad data is more likely than most people think, which brings us
> to:

I understand that. And the answer is usually to go and do and ANALYZE
manually (if it isn't this, it will be some dependency on a set-returning
stored function we wrote before we could specify the rows and cost). My
question is really - why do I need this constant intervention? When we
rarely do aggregates, when our queries are (nearly) always single row
queries (and very rarely more than 50 rows) out of tables that have hundreds
of thousands to millions of rows, what does it take to NOT have to
intervene? WHich brings me to your next point:

> I don't see effective_cache_size listed there.  If that's at the default,
> I wouldn't be surprised that you're seeing sequential scans instead of
> indexed ones far too often.

Nice to know - I suspect someone has been messing around with stuff they
don't understand. I do know that after some screwing around they got the
server to the point that it wouldn't restart and tried to back out until it
would.

>> max_connections = 200
>> work_mem = 512MB

> This is a frightening combination by the way.

Looks like it's connected to the above issue. The real max connection value
is 1/10th of that.

Thanks Greg!

Carlo


Re: New server to improve performance on our large and busy DB - advice?

From
marcin mank
Date:
On Thu, Jan 14, 2010 at 8:17 PM, Carlo Stonebanks
<stonec.register@sympatico.ca> wrote:
> . 48 GB RAM
> 2) Which Windows OS would you recommend? (currently 2008 x64 Server)

There is not a 64-bit windows build now - You would be limited to
shared_buffers at about a gigabyte. Choose Linux

Greetings
Marcin Mańk

Re: New server to improve performance on our large and busy DB - advice?

From
"Carlo Stonebanks"
Date:
Hi Scott,

Sorry for the very late reply on this post, but I'd like to follow up. The
reason that I took so long to reply was due to this suggestion:

<<Run vacuum verbose to see if you're
overrunning the max_fsm_pages settings or the max_fsm_relations.
>>

My first thought was, does he mean against the entire DB? That would take a
week! But, since it was recommended, I decided to see what would happen. So,
I just ran VACUUM VERBOSE. After five days, it was still vacuuming and the
server admin said they needed to bounce the server, which means the command
never completed (I kept the log of the progress so far, but don't know if
the values you needed would appear at the end. I confess I have no idea how
to relate the INFO and DETAIL data coming back with regards to max_fsm_pages
settings or the max_fsm_relations.

So, now my questions are:

1) Did you really mean you wanted VACUUM VERBOSE to run against the entire
DB?
2) Given my previous comments on the size of the DB (and my thinking that
this is an exceptionally large and busy DB) were you expecting it to take
this long?
3) I took no exceptional measures before running it, I didn't stop the
automated import processes, I didn't turn off autovacuum. Would this have
accounted for the time it is taking to THAT degree?
4) Any other way to get max_fsm_pages settings and max_fsm_relations?

Carlo


Re: New server to improve performance on our large and busy DB - advice?

From
Scott Marlowe
Date:
On Tue, Jan 19, 2010 at 2:09 PM, Carlo Stonebanks
<stonec.register@sympatico.ca> wrote:
> Hi Scott,
>
> Sorry for the very late reply on this post, but I'd like to follow up. The
> reason that I took so long to reply was due to this suggestion:
>
> <<Run vacuum verbose to see if you're
> overrunning the max_fsm_pages settings or the max_fsm_relations.
>>>
>
> My first thought was, does he mean against the entire DB? That would take a
> week! But, since it was recommended, I decided to see what would happen. So,
> I just ran VACUUM VERBOSE. After five days, it was still vacuuming and the
> server admin said they needed to bounce the server, which means the command
> never completed (I kept the log of the progress so far, but don't know if
> the values you needed would appear at the end. I confess I have no idea how
> to relate the INFO and DETAIL data coming back with regards to max_fsm_pages
> settings or the max_fsm_relations.

yeah, the values are at the end.  Sounds like your vacuum settings are
too non-aggresive.  Generally this is the vacuum cost delay being too
high.

> So, now my questions are:
>
> 1) Did you really mean you wanted VACUUM VERBOSE to run against the entire
> DB?

Yes.  A whole db at least.  However...

> 2) Given my previous comments on the size of the DB (and my thinking that
> this is an exceptionally large and busy DB) were you expecting it to take
> this long?

Yes, I was figuring it would be a while.  However...

> 3) I took no exceptional measures before running it, I didn't stop the
> automated import processes, I didn't turn off autovacuum. Would this have
> accounted for the time it is taking to THAT degree?

Nah, not really.  However...

> 4) Any other way to get max_fsm_pages settings and max_fsm_relations?

Yes!  You can run vacuum verbose against the regular old postgres
database (or just create one for testing with nothing in it) and
you'll still get the fsm usage numbers from that!  So, no need to run
it against the big db.  However, if regular vacuum verbose couldn't
finish in a week, then you've likely got vacuum and autovacuum set to
be too timid in their operation, and may be getting pretty bloated as
we speak.  Once the fsm gets too blown out of the water, it's quicker
to dump and reload the whole DB than to try and fix it.

Re: New server to improve performance on our large and busy DB - advice?

From
"Carlo Stonebanks"
Date:
> yeah, the values are at the end.  Sounds like your vacuum settings are
> too non-aggresive.  Generally this is the vacuum cost delay being too
> high.

Of course, I have to ask: what's the down side?

> Yes!  You can run vacuum verbose against the regular old postgres
> database (or just create one for testing with nothing in it) and
> you'll still get the fsm usage numbers from that!  So, no need to run
> it against the big db.  However, if regular vacuum verbose couldn't
> finish in a week, then you've likely got vacuum and autovacuum set to
> be too timid in their operation, and may be getting pretty bloated as
> we speak.  Once the fsm gets too blown out of the water, it's quicker
> to dump and reload the whole DB than to try and fix it.

My client reports this is what they actualyl do on a monthly basis.

And the numbers are in:

>> NOTICE:  number of page slots needed (4090224) exceeds max_fsm_pages
>> (204800)
>> HINT:  Consider increasing the configuration parameter "max_fsm_pages" to
>> a value over 4090224.

Gee, only off by a factor of 20. What happens if I go for this number (once
again, what's the down side)?

Carlo


Re: New server to improve performance on our large and busy DB - advice?

From
"Kevin Grittner"
Date:
"Carlo Stonebanks" <stonec.register@sympatico.ca> wrote:

>>  yeah, the values are at the end.  Sounds like your vacuum
>> settings are too non-aggresive.  Generally this is the vacuum
>> cost delay being too high.
>
> Of course, I have to ask: what's the down side?

If you make it too aggressive, it could impact throughput or
response time.  Odds are that the bloat from having it not
aggressive enough is currently having a worse impact.

>> Once the fsm gets too blown out of the water, it's quicker
>> to dump and reload the whole DB than to try and fix it.
>
> My client reports this is what they actualyl do on a monthly
> basis.

The probably won't need to do that with proper configuration and
vacuum policies.

>>> NOTICE:  number of page slots needed (4090224) exceeds
>>> max_fsm_pages (204800)
>>> HINT:  Consider increasing the configuration parameter
>>> "max_fsm_pages" to a value over 4090224.
>
> Gee, only off by a factor of 20. What happens if I go for this
> number (once again, what's the down side)?

It costs six bytes of shared memory per entry.

http://www.postgresql.org/docs/8.3/interactive/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-FSM

-Kevin

Re: New server to improve performance on our large and busy DB - advice?

From
Robert Haas
Date:
On Wed, Jan 20, 2010 at 3:03 PM, Carlo Stonebanks
<stonec.register@sympatico.ca> wrote:
>> Yes!  You can run vacuum verbose against the regular old postgres
>> database (or just create one for testing with nothing in it) and
>> you'll still get the fsm usage numbers from that!  So, no need to run
>> it against the big db.  However, if regular vacuum verbose couldn't
>> finish in a week, then you've likely got vacuum and autovacuum set to
>> be too timid in their operation, and may be getting pretty bloated as
>> we speak.  Once the fsm gets too blown out of the water, it's quicker
>> to dump and reload the whole DB than to try and fix it.
>
> My client reports this is what they actualyl do on a monthly basis.

Something is deeply wrong with your client's vacuuming policies.

...Robert

Re: New server to improve performance on our large and busy DB - advice?

From
Alvaro Herrera
Date:
Scott Marlowe escribió:
> On Thu, Jan 14, 2010 at 12:17 PM, Carlo Stonebanks
> <stonec.register@sympatico.ca> wrote:

> > 4) Is this the right PG version for our needs?
>
> 8.3 is very stable.  Update to the latest.  8.4 seems good, but I've
> had, and still am having, problems with it crashing in production.
> Not often, maybe once every couple of months, but just enough that I'm
> not ready to try and use it there yet.  And I can't force the same
> failure in testing, at least not yet.

uh.  Is there a report of the crash somewhere with details, say stack
traces and such?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: New server to improve performance on our large and busy DB - advice?

From
Scott Marlowe
Date:
On Thu, Jan 21, 2010 at 8:51 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Scott Marlowe escribió:
>> On Thu, Jan 14, 2010 at 12:17 PM, Carlo Stonebanks
>> <stonec.register@sympatico.ca> wrote:
>
>> > 4) Is this the right PG version for our needs?
>>
>> 8.3 is very stable.  Update to the latest.  8.4 seems good, but I've
>> had, and still am having, problems with it crashing in production.
>> Not often, maybe once every couple of months, but just enough that I'm
>> not ready to try and use it there yet.  And I can't force the same
>> failure in testing, at least not yet.
>
> uh.  Is there a report of the crash somewhere with details, say stack
> traces and such?

No, the only server that does this is in production as our stats db
and when it happens it usually gets restarted immediately.  It does
this about once every two months.  Do the PGDG releases have debugging
symbols and what not?  I'll see about having a stack trace ready to
run for the next time it does this.

Re: New server to improve performance on our large and busy DB - advice?

From
Alvaro Herrera
Date:
Scott Marlowe escribió:
> On Thu, Jan 21, 2010 at 8:51 AM, Alvaro Herrera
> <alvherre@commandprompt.com> wrote:
> > Scott Marlowe escribió:
> >> On Thu, Jan 14, 2010 at 12:17 PM, Carlo Stonebanks
> >> <stonec.register@sympatico.ca> wrote:
> >
> >> > 4) Is this the right PG version for our needs?
> >>
> >> 8.3 is very stable.  Update to the latest.  8.4 seems good, but I've
> >> had, and still am having, problems with it crashing in production.
> >> Not often, maybe once every couple of months, but just enough that I'm
> >> not ready to try and use it there yet.  And I can't force the same
> >> failure in testing, at least not yet.
> >
> > uh.  Is there a report of the crash somewhere with details, say stack
> > traces and such?
>
> No, the only server that does this is in production as our stats db
> and when it happens it usually gets restarted immediately.  It does
> this about once every two months.  Do the PGDG releases have debugging
> symbols and what not?  I'll see about having a stack trace ready to
> run for the next time it does this.

You mean the RPMs?  Yes, I think Devrim publishes debuginfo packages
which you need to install separately.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: New server to improve performance on our large and busy DB - advice?

From
Scott Marlowe
Date:
On Thu, Jan 21, 2010 at 9:44 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Scott Marlowe escribió:
>> On Thu, Jan 21, 2010 at 8:51 AM, Alvaro Herrera
>> <alvherre@commandprompt.com> wrote:
>> > Scott Marlowe escribió:
>> >> On Thu, Jan 14, 2010 at 12:17 PM, Carlo Stonebanks
>> >> <stonec.register@sympatico.ca> wrote:
>> >
>> >> > 4) Is this the right PG version for our needs?
>> >>
>> >> 8.3 is very stable.  Update to the latest.  8.4 seems good, but I've
>> >> had, and still am having, problems with it crashing in production.
>> >> Not often, maybe once every couple of months, but just enough that I'm
>> >> not ready to try and use it there yet.  And I can't force the same
>> >> failure in testing, at least not yet.
>> >
>> > uh.  Is there a report of the crash somewhere with details, say stack
>> > traces and such?
>>
>> No, the only server that does this is in production as our stats db
>> and when it happens it usually gets restarted immediately.  It does
>> this about once every two months.  Do the PGDG releases have debugging
>> symbols and what not?  I'll see about having a stack trace ready to
>> run for the next time it does this.
>
> You mean the RPMs?  Yes, I think Devrim publishes debuginfo packages
> which you need to install separately.

Well crap, this one was built from source, and not with debugging.
Gimme a day or so and I'll have it rebuilt with debug and can run a
useful backtrace on it.

Re: New server to improve performance on our large and busy DB - advice?

From
Devrim GÜNDÜZ
Date:
On Thu, 2010-01-21 at 13:44 -0300, Alvaro Herrera wrote:
> I think Devrim publishes debuginfo packages which you need to install
> separately.

Right.
--
Devrim GÜNDÜZ, RHCE
Command Prompt - http://www.CommandPrompt.com
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz

Attachment

Re: New server to improve performance on our large and busy DB - advice?

From
"Carlo Stonebanks"
Date:
Hi Greg,

As a follow up to this suggestion:

>> I don't see effective_cache_size listed there.  If that's at the default,
>> I wouldn't be surprised that you're seeing sequential scans instead of
>> indexed ones far too often.

I found an article written by you
http://www.westnet.com/~gsmith/content/postgresql/pg-5minute.htm and thought
this was pretty useful, and especially this comment:

<<effective_cache_size should be set to how much memory is leftover for disk
caching after taking into account what's used by the operating system,
dedicated PostgreSQL memory, and other applications. If it's set too low,
indexes may not be used for executing queries the way you'd expect. Setting
effective_cache_size to 1/2 of total memory would be a normal conservative
setting. You might find a better estimate by looking at your operating
system's statistics. On UNIX-like systems, add the free+cached numbers from
free or top. On Windows see the "System Cache" in the Windows Task Manager's
Performance tab.
>>

Are these values to look at BEFORE starting PG? If so, how do I relate the
values returned to setting the effective_cache_size values?

Carlo

PS Loved your 1995 era pages. Being a musician, it was great to read your
recommendations on how to buy these things called "CD's". I Googled the
term, and they appear to be some ancient precursor to MP3s which people
actually PAID for. What kind of stone were they engraved on? ;-D



Re: New server to improve performance on our large and busy DB - advice?

From
Greg Smith
Date:
Carlo Stonebanks wrote:
>
> <<effective_cache_size should be set to how much memory is leftover
> for disk caching after taking into account what's used by the
> operating system, dedicated PostgreSQL memory, and other applications.
> If it's set too low, indexes may not be used for executing queries the
> way you'd expect. Setting effective_cache_size to 1/2 of total memory
> would be a normal conservative setting. You might find a better
> estimate by looking at your operating system's statistics. On
> UNIX-like systems, add the free+cached numbers from free or top. On
> Windows see the "System Cache" in the Windows Task Manager's
> Performance tab.
>>>
> Are these values to look at BEFORE starting PG? If so, how do I relate
> the values returned to setting the effective_cache_size values?
>
After starting the database.  You can set effective_cache_size to a size
in megabytes, so basically you'd look at the amount of free cache, maybe
round down a bit, and set effective_cache_size to exactly that.  It's
not super important to get the number right.  The point is that the
default is going to be a tiny number way smaller than the RAM in your
system, and even getting it within a factor of 2 or 3 of reality will
radically change some types of query plans.

> PS Loved your 1995 era pages. Being a musician, it was great to read
> your recommendations on how to buy these things called "CD's". I
> Googled the term, and they appear to be some ancient precursor to MP3s
> which people actually PAID for. What kind of stone were they engraved
> on? ;-D

They're plastic, just like the iPod, iPhone, iToilet, or whatever other
white plastic Apple products people listen to music during this new
era.  Since both my CD collection and the stereo I listen to them on are
each individually worth more than my car, it's really tough to sell me
on all the terrible sounding MP3s I hear nowadays.  I'm the guy who can
tell you how the LP, regular CD, gold CD, and SACD/DVD-A for albums I
like all compare, so dropping below CD quality is right out.  If you
ever find yourself going "hey, I wish I had six different versions of
'Dark Side of the Moon' around so I could compare the subtle differences
in the mastering and mix on each of them", I'm your guy.

--
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com