Thread: Massive table (500M rows) update nightmare
Our DB has an audit table which is 500M rows and growing. (FYI the objects being audited are grouped semantically, not individual field values). Recently we wanted to add a new feature and we altered the table to add a new column. We are backfilling this varchar(255) column by writing a TCL script to page through the rows (where every update is a UPDATE ... WHERE id >= x AND id < x+10 and a commit is performed after every 1000 updates statement, i.e. every 10000 rows.) We have 10 columns, six of which are indexed. Rough calculations suggest that this will take two to three weeks to complete on an 8-core CPU with more than enough memory. As a ballpark estimate - is this sort of performance for an 500M updates what one would expect of PG given the table structure (detailed below) or should I dig deeper to look for performance issues? As always, thanks! Carlo Table/index structure: CREATE TABLE mdx_core.audit_impt ( audit_impt_id serial NOT NULL, impt_session integer, impt_version character varying(255), impt_name character varying(255), impt_id integer, target_table character varying(255), target_id integer, target_op character varying(10), note text, source_table character varying(255), CONSTRAINT audit_impt_pkey PRIMARY KEY (audit_impt_id) ) CREATE INDEX audit_impt_impt_id_idx ON mdx_core.audit_impt USING btree (impt_id); CREATE INDEX audit_impt_impt_name ON mdx_core.audit_impt USING btree (impt_name, impt_version); CREATE INDEX audit_impt_session_idx ON mdx_core.audit_impt USING btree (impt_session); CREATE INDEX audit_impt_source_table ON mdx_core.audit_impt USING btree (source_table); CREATE INDEX audit_impt_target_id_idx ON mdx_core.audit_impt USING btree (target_id, audit_impt_id); CREATE INDEX audit_impt_target_table_idx ON mdx_core.audit_impt USING btree (target_table, target_id, audit_impt_id);
On Thu, Jan 7, 2010 at 12:17 AM, Carlo Stonebanks <stonec.register@sympatico.ca> wrote: > Our DB has an audit table which is 500M rows and growing. (FYI the objects > being audited are grouped semantically, not individual field values). > > Recently we wanted to add a new feature and we altered the table to add a > new column. We are backfilling this varchar(255) column by writing a TCL > script to page through the rows (where every update is a UPDATE ... WHERE id >>= x AND id < x+10 and a commit is performed after every 1000 updates > statement, i.e. every 10000 rows.) > > We have 10 columns, six of which are indexed. Rough calculations suggest > that this will take two to three weeks to complete on an 8-core CPU with > more than enough memory. > > As a ballpark estimate - is this sort of performance for an 500M updates > what one would expect of PG given the table structure (detailed below) or > should I dig deeper to look for performance issues? Got an explain analyze of the delete query?
Carlo Stonebanks wrote: > Our DB has an audit table which is 500M rows and growing. (FYI the > objects being audited are grouped semantically, not individual field > values). > > Recently we wanted to add a new feature and we altered the table to add > a new column. We are backfilling this varchar(255) column by writing a > TCL script to page through the rows (where every update is a UPDATE ... > WHERE id >= x AND id < x+10 and a commit is performed after every 1000 > updates statement, i.e. every 10000 rows.) > > We have 10 columns, six of which are indexed. Rough calculations suggest > that this will take two to three weeks to complete on an 8-core CPU with > more than enough memory. > > As a ballpark estimate - is this sort of performance for an 500M updates > what one would expect of PG given the table structure (detailed below) > or should I dig deeper to look for performance issues? > > As always, thanks! > > Carlo > If it is possible to lock this audit table exclusively (may be during off peak hours) I would look into - create new_audit_table as select col1, col2, col3 ... col9, 'new_col_value' from old_audit_table; - create all indexes - drop old_audit_table - rename new_audit_table to old_audit_table That is probably the fasted method you can do, even if you have to join the "new_col_value" from an extra helper-table with the correspondig id. Remeber, databases are born to join. You could also try to just update the whole table in one go, it is probably faster than you expect. just a thought Leo
Leo Mannhart <leo.mannhart@beecom.ch> wrote: > You could also try to just update the whole table in one go, it is > probably faster than you expect. That would, of course, bloat the table and indexes horribly. One advantage of the incremental approach is that there is a chance for autovacuum or scheduled vacuums to make space available for re-use by subsequent updates. -Kevin
Kevin Grittner wrote: > Leo Mannhart <leo.mannhart@beecom.ch> wrote: > >> You could also try to just update the whole table in one go, it is >> probably faster than you expect. > > That would, of course, bloat the table and indexes horribly. One > advantage of the incremental approach is that there is a chance for > autovacuum or scheduled vacuums to make space available for re-use > by subsequent updates. > > -Kevin > ouch... thanks for correcting this. ... and forgive an old man coming from Oracle ;) Leo
I would suggest:
--
Ludwik Dyląg
1. turn off autovacuum
1a. ewentually tune db for better performace for this kind of operation (cant not help here)
2. restart database
3. drop all indexes
4. update
5. vacuum full table
6. create indexes
7. turn on autovacuum
Ludwik
2010/1/7 Leo Mannhart <leo.mannhart@beecom.ch>
Kevin Grittner wrote:ouch...
> Leo Mannhart <leo.mannhart@beecom.ch> wrote:
>
>> You could also try to just update the whole table in one go, it is
>> probably faster than you expect.
>
> That would, of course, bloat the table and indexes horribly. One
> advantage of the incremental approach is that there is a chance for
> autovacuum or scheduled vacuums to make space available for re-use
> by subsequent updates.
>
> -Kevin
>
thanks for correcting this.
... and forgive an old man coming from Oracle ;)
Leo
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
--
Ludwik Dyląg
Ludwik Dylag <ldylag@gmail.com> wrote: > I would suggest: > 1. turn off autovacuum > 1a. ewentually tune db for better performace for this kind of > operation (cant not help here) > 2. restart database > 3. drop all indexes > 4. update > 5. vacuum full table > 6. create indexes > 7. turn on autovacuum I've only ever attempted something like that with a few tens of millions of rows. I gave up on waiting for the VACUUM FULL step after a few days. I some scheduled down time is acceptable (with "some" kind of hard to estimate accurately) the best bet would be to add the column with the USING clause to fill in the value. (I think that would cause a table rewrite; if not, then add something to the ALTER TABLE which would.) My impression is that the OP would rather stretch out the implementation than to suffer down time, which can certainly be a valid call. If that is the goal, then the real question is whether there's a way to tune the incremental updates to speed that phase. Carlo, what version of PostgreSQL is this? Can you show us the results of an EXPLAIN ANALYZE for the run of one iteration of the UPDATE? Information on the OS, hardware, PostgreSQL build configuration, and the contents of postgresql.conf (excluding all comments) could help us spot possible techniques to speed this up. -Kevin
On Thursday 07 January 2010 09:57:59 Kevin Grittner wrote: > Ludwik Dylag <ldylag@gmail.com> wrote: > > I would suggest: > > 1. turn off autovacuum > > 1a. ewentually tune db for better performace for this kind of > > operation (cant not help here) > > 2. restart database > > 3. drop all indexes > > 4. update > > 5. vacuum full table > > 6. create indexes > > 7. turn on autovacuum > > I've only ever attempted something like that with a few tens of > millions of rows. I gave up on waiting for the VACUUM FULL step > after a few days. > > I some scheduled down time is acceptable (with "some" kind of hard > to estimate accurately) the best bet would be to add the column with > the USING clause to fill in the value. (I think that would cause a > table rewrite; if not, then add something to the ALTER TABLE which > would.) My impression is that the OP would rather stretch out the > implementation than to suffer down time, which can certainly be a > valid call. > > If that is the goal, then the real question is whether there's a way > to tune the incremental updates to speed that phase. Carlo, what > version of PostgreSQL is this? Can you show us the results of an > EXPLAIN ANALYZE for the run of one iteration of the UPDATE? > Information on the OS, hardware, PostgreSQL build configuration, and > the contents of postgresql.conf (excluding all comments) could help > us spot possible techniques to speed this up. > > -Kevin > If you can come up with an effective method of tracking updates/deletes/inserts such as a trigger that writes the PK to a separate table upon any inserts, updates or deletes to the table you could do something like this: 1) create new table (no indexes) with the structure you want the table to have at the end of the process (i.e. the post-altered state) [new_tab] 2) create the insert,update,delete triggers mentioned above on the existing table [curr_tab] and write all the PK id's that change into a 3rd table [changed_keys] 3) kick off a process that simply does a select from curr_tab into new_tab and populates/back-fills the new column as part of the query 4) let it run as long as it takes 5) once it's complete do this: Create the all the indexes on the new_tab BEGIN; LOCK TABLE curr_tab; DELETE from new_tab where pk_id in (select distinct pk_id from changed_keys); INSERT into new_tab select * from curr_tab where curr_tab.pk_id in (select distinct pk_id from changed_keys); ALTER TABLE curr_tab RENAME to old_tab; ALTER TABLE new_tab RENAME to curr_tab; COMMIT; Also you might want to consider partitioning this table in the process... Once you're confident you no longer need the old table [old_tab] you can drop it
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > If it is possible to lock this audit table exclusively (may be during > off peak hours) I would look into > - create new_audit_table as select col1, col2, col3 ... col9, > 'new_col_value' from old_audit_table; > - create all indexes > - drop old_audit_table > - rename new_audit_table to old_audit_table This is a good approach, but you don't necessarily have to exclusively lock the table. Only allowing reads would be enough, or you could install a trigger to keep track of which rows were updated. Then the process becomes: 1. Create trigger on the old table to store changed pks 2. Create new_audit_table as select col1, col2, col3 ... col9, 'new_col_value' from old_audit_table; 3. Create all indexes on the new table 4. Stop your app from writing to the old table 5. COPY over the rows that have changed 6. Rename the old table to something else (for safety) 7. Rename the new table to the real name 8. Drop the old table when all is good - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201001071253 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAktGIC0ACgkQvJuQZxSWSshEAQCfRT3PsQyWCOBXGW1XRAB814df pJUAoMuAJoOKho39opoHq/d1J9NprGlH =htaE -----END PGP SIGNATURE-----
> Got an explain analyze of the delete query? UPDATE mdx_core.audit_impt SET source_table = 'mdx_import.'||impt_name WHERE audit_impt_id >= 319400001 AND audit_impt_id <= 319400010 AND coalesce(source_table, '') = '' Index Scan using audit_impt_pkey on audit_impt (cost=0.00..92.63 rows=1 width=608) (actual time=0.081..0.244 rows=10 loops=1) Index Cond: ((audit_impt_id >= 319400001) AND (audit_impt_id <= 319400010)) Filter: ((COALESCE(source_table, ''::character varying))::text = ''::text) Total runtime: 372.141 ms Hard to tell how reliable these numbers are, because the caches are likely spun up for the WHERE clause - in particular, SELECT queries have been run to test whether the rows actually qualify for the update. The coalesce may be slowing things down slightly, but is a necessary evil.
> If it is possible to lock this audit table exclusively (may be during > off peak hours) I would look into > - create new_audit_table as select col1, col2, col3 ... col9, > 'new_col_value' from old_audit_table; > - create all indexes > - drop old_audit_table > - rename new_audit_table to old_audit_table > > That is probably the fasted method you can do, even if you have to join > the "new_col_value" from an extra helper-table with the correspondig id. > Remeber, databases are born to join. > This has all been done before - the production team was crippled while they waited for this and the SECOND the table was available again, they jumped on it - even though it meant recreating the bare minimum of the indexes. > You could also try to just update the whole table in one go, it is > probably faster than you expect. Possibly, but with such a large table you have no idea of the progress, you cannot interrupt it without rolling back everything. Worse, you have applications stalling and users wanting to know what is going on - is the OS and the DB/MVCC trashing while it does internal maintenance? Have you reached some sort of deadlock condition that you can't see because the server status is not helpful with so many uncommitted pending updates? And of course, there is the file bloat.
> every update is a UPDATE ... WHERE id >>= x AND id < x+10 and a commit is performed after every 1000 updates > statement, i.e. every 10000 rows. What is the rationale behind this? How about doing 10k rows in 1 update, and committing every time? You could try making the condition on the ctid column, to not have to use the index on ID, and process the rows in physical order. First make sure that newly inserted production data has the correct value in the new column, and add 'where new_column is null' to the conditions. But I have never tried this, use at Your own risk. Greetings Marcin Mank
> What is the rationale behind this? How about doing 10k rows in 1 > update, and committing every time? When we did 10K updates, the application would sometimes appear to have frozen, and we were concerned that there was a deadlock condition because of the number of locked rows. While we may have the patience to sit around and wait five minutes to see if the update would continue, we couldn't risk having other applications appear frozen if that was the case. In fact, there is no reason for any user or application to write to the same records we are writing to - but the audit table is read frequently. We are not explicitly locking anything, or writing any additional code to arbitrate the lcoking model anywhere -it's all default UPDATE and SELECT syntax. Doing the updates in smaller chunks resolved these apparent freezes - or, more specifically, when the application DID freeze, it didn't do it for more than 30 seconds. In all likelyhood, this is the OS and the DB thrashing. We have since modified the updates to process 1000 rows at a time with a commit every 10 pages. Just this morning, though, the IS manager asked me to stop the backfill because of the load affect on other processes. > You could try making the condition on the ctid column, to not have to > use the index on ID, and process the rows in physical order. An interesting idea, if I can confirm that the performance problem is because of the WHERE clause, not the UPDATE. >'where new_column is null' to the conditions. Already being done, albeit with a coalesce(val, '') = '' - it's quite possible that this is hurting the WHERE clause; the EXPLAIN shows the table using the pkey and then filtering on the COALESCE as one would expect. Carlo
"Carlo Stonebanks" <stonec.register@sympatico.ca> wrote: > An interesting idea, if I can confirm that the performance problem > is because of the WHERE clause, not the UPDATE. If you could show EXPLAIN ANALYZE output for one iteration, with related queries and maybe more info on the environment, it would take most of the guesswork out of things. -Kevin
On Thu, Jan 7, 2010 at 2:48 PM, Carlo Stonebanks <stonec.register@sympatico.ca> wrote: > Doing the updates in smaller chunks resolved these apparent freezes - or, > more specifically, when the application DID freeze, it didn't do it for more > than 30 seconds. In all likelyhood, this is the OS and the DB thrashing. It might well be checkpoints. Have you tried cranking up checkpoint segments to something like 100 or more and seeing how it behaves then?
> It might well be checkpoints. Have you tried cranking up checkpoint > segments to something like 100 or more and seeing how it behaves then? No I haven't, althugh it certainly make sense - watching the process run, you get this sense that the system occaisionally pauses to take a deep, long breath before returning to work frantically ;D Checkpoint_segments are currently set to 64. The DB is large and is on a constant state of receiving single-row updates as multiple ETL and refinement processes run continuously. Would you expect going to 100 or more to make an appreciable difference, or should I be more aggressive? >
Already done in an earlier post, Kevin - I have included it again below. As you can see, it's pretty well wqhat you would expect, index scan plus a filter. One note: updates where no rows qualify run appreciably faster than the ones that do. That is, the update itself appears to be consuming a good deal of the processing time. This may be due to the 6 indexes. UPDATE mdx_core.audit_impt SET source_table = 'mdx_import.'||impt_name WHERE audit_impt_id >= 319400001 AND audit_impt_id <= 319400010 AND coalesce(source_table, '') = '' Index Scan using audit_impt_pkey on audit_impt (cost=0.00..92.63 rows=1 width=608) (actual time=0.081..0.244 rows=10 loops=1) Index Cond: ((audit_impt_id >= 319400001) AND (audit_impt_id <= 319400010)) Filter: ((COALESCE(source_table, ''::character varying))::text = ''::text) Total runtime: 372.141 ms ""Kevin Grittner"" <Kevin.Grittner@wicourts.gov> wrote in message news:4B462563020000250002DFA3@gw.wicourts.gov... > "Carlo Stonebanks" <stonec.register@sympatico.ca> wrote: > >> An interesting idea, if I can confirm that the performance problem >> is because of the WHERE clause, not the UPDATE. > > If you could show EXPLAIN ANALYZE output for one iteration, with > related queries and maybe more info on the environment, it would > take most of the guesswork out of things. > > -Kevin > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
On Thu, Jan 7, 2010 at 11:14 PM, Carlo Stonebanks <stonec.register@sympatico.ca> wrote: >> It might well be checkpoints. Have you tried cranking up checkpoint >> segments to something like 100 or more and seeing how it behaves then? > > No I haven't, althugh it certainly make sense - watching the process run, > you get this sense that the system occaisionally pauses to take a deep, long > breath before returning to work frantically ;D > > Checkpoint_segments are currently set to 64. The DB is large and is on a > constant state of receiving single-row updates as multiple ETL and > refinement processes run continuously. > > Would you expect going to 100 or more to make an appreciable difference, or > should I be more aggressive? If you're already at 64 then not much. Probably wouldn't hurt to crank it up more and delay the checkpoints as much as possible during these updates. 64 segments is already 1024M. If you're changing a lot more data than that in a single update / insert then cranking them up more might help. What you might need to do is to change your completion target to something closer to 100% since it's likely that most of the updates / inserts are not happening to the same rows over and over, but to different rows for each one, the closer you can get to 100% completed before the next checkpoint the better. This will cause some more IO to happen, but will even it out more (hopefully) so that you don't get checkpoint spikes. Look into the checkpoint logging options so you can monitor how they're affecting system performance.
At 08:17 07/01/2010, Carlo Stonebanks wrote: >Our DB has an audit table which is 500M rows and growing. (FYI the objects being audited are grouped semantically, not individualfield values). > >Recently we wanted to add a new feature and we altered the table to add a new column. We are backfilling this varchar(255)column by writing a TCL script to page through the rows (where every update is a UPDATE ... WHERE id >= x ANDid < x+10 and a commit is performed after every 1000 updates statement, i.e. every 10000 rows.) > >We have 10 columns, six of which are indexed. Rough calculations suggest that this will take two to three weeks to completeon an 8-core CPU with more than enough memory. > >As a ballpark estimate - is this sort of performance for an 500M updates what one would expect of PG given the table structure(detailed below) or should I dig deeper to look for performance issues? > >As always, thanks! > >Carlo You can dump the table with pg_dumpand get a file like this (only a few columns, not all of them). Note that the last line,has the data in TSV (Tab Separate Format) plus a LF. SET statement_timeout = 0; SET client_encoding = 'UTF8'; SET standard_conforming_strings = off; SET check_function_bodies = false; SET client_min_messages = warning; SET escape_string_warning = off; SET search_path = public, pg_catalog; SET default_tablespace = ''; SET default_with_oids = false; CREATE TABLE mdx_core.audit_impt ( audit_impt_id serial NOT NULL, impt_session integer, impt_version character varying(255), } COPY mdx_core.audit_impt (audit_impt_id, impt_session, impt_version) FROM stdin; 1 1 tateti 2 32 coll You can add a new column hacking it, just adding the new column to the schema, the name in the copy statement and a tabulator+dataat the end of the line (before the LF). Note that the table name is different from the original. SET statement_timeout = 0; SET client_encoding = 'UTF8'; SET standard_conforming_strings = off; SET check_function_bodies = false; SET client_min_messages = warning; SET escape_string_warning = off; SET search_path = public, pg_catalog; SET default_tablespace = ''; SET default_with_oids = false; CREATE TABLE mdx_core.audit_impt2 ( audit_impt_id serial NOT NULL, impt_session integer, impt_version character varying(255), source_table character varying(255) } COPY mdx_core.audit_impt2 (audit_impt_id, impt_session, impt_version, source_table) FROM stdin; 1 1 tateti tentown 1 32 coll krof After this, add indexes, constraints as usual. HTH -------------------------------- Eduardo Morrás González Dept. I+D+i e-Crime Vigilancia Digital S21sec Labs Tlf: +34 902 222 521 Móvil: +34 555 555 555 www.s21sec.com, blog.s21sec.com Salvo que se indique lo contrario, esta información es CONFIDENCIAL y contiene datos de carácter personal que han de ser tratados conforme a la legislación vigente en materia de protección de datos. Si usted no es destinatario original de este mensaje, le comunicamos que no está autorizado a revisar, reenviar, distribuir, copiar o imprimir la información en él contenida y le rogamos que proceda a borrarlo de sus sistemas. Kontrakoa adierazi ezean, posta elektroniko honen barruan doana ISILPEKO informazioa da eta izaera pertsonaleko datuak dituenez, indarrean dagoen datu pertsonalak babesteko legediaren arabera tratatu beharrekoa. Posta honen hartzaile ez zaren kasuan, jakinarazten dizugu baimenik ez duzula bertan dagoen informazioa aztertu, igorri, banatu, kopiatu edo inprimatzeko. Hortaz, erregutzen dizugu posta hau zure sistemetatik berehala ezabatzea. Antes de imprimir este mensaje valora si verdaderamente es necesario. De esta forma contribuimos a la preservación del Medio Ambiente.
"Carlo Stonebanks" <stonec.register@sympatico.ca> wrote: > Already done in an earlier post Perhaps I misunderstood; I thought that post mentioned that the plan was one statement in an iteration, and that the cache would have been primed by a previous query checking whether there were any rows to update. If that was the case, it might be worthwhile to look at the entire flow of an iteration. Also, if you ever responded with version and configuration information, I missed it. The solution to parts of what you describe would be different in different versions. In particular, you might be able to solve checkpoint-related lockup issues and then improve performance by using bigger batches. Right now I would be guessing at what might work for you. -Kevin
> crank it up more and delay the checkpoints as much as possible during > these updates. 64 segments is already 1024M. We have 425M rows, total table size is 78GB, so we can imagine a worst case UPDATE write is less than 200 bytes * number of rows specified in the update (is that logic correct?). Inerestingly, the total index size is 148GB, twice that of the table, which may be an indication of where the performance bottleneck is.
Carlo Stonebanks <stonec.register@sympatico.ca> wrote: > Inerestingly, the total index size is 148GB, twice that of the table, > which may be an indication of where the performance bottleneck is. Maybe a sign for massive index-bloat? Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
>I thought that post mentioned that the plan > was one statement in an iteration, and that the cache would have > been primed by a previous query checking whether there were any rows > to update. If that was the case, it might be worthwhile to look at > the entire flow of an iteration. This is the only SQL query in the code in question - the rest of the code manages the looping and commit. The code was copied to PgAdminIII and values written in for the WHERE clause. In order for me to validate that rows would have been updated, I had to run a SELECT with the same WHERE clause in PgAdminIII first to see how many rows would have qualified. But this was for testing purposes only. The SELECT statement does not exist in the code. The vast majority of the rows that will be processed will be updated as this is a backfill to synch the old rows with the values being filled into new columns now being inserted. > Also, if you ever responded with version and configuration > information, I missed it. This is hosted on a new server the client set up so I am waiting for the exact OS and hardware config. PG Version is PostgreSQL 8.3.6, compiled by Visual C++ build 1400, OS appears to be Windows 2003 x64 Server. More than anything, I am more concerned with the long-term use of the system. This particular challenge with the 500M row update is one thing, but I am concerned about the exceptional effort required to do this. Is it REALLY this exceptional to want to update 500M rows of data in this day and age? Or is the fact that we are considering dumping and restoring and dropping indexes, etc to do all an early warning that we don't have a solution that is scaled to the problem? Config data follows (I am assuming commented values which I did not include are defaulted). Carlo autovacuum = on autovacuum_analyze_scale_factor = 0.1 autovacuum_analyze_threshold = 250 autovacuum_naptime = 1min autovacuum_vacuum_scale_factor = 0.2 autovacuum_vacuum_threshold = 500 bgwriter_lru_maxpages = 100 checkpoint_segments = 64 checkpoint_warning = 290 datestyle = 'iso, mdy' default_text_search_config = 'pg_catalog.english' lc_messages = 'C' lc_monetary = 'C' lc_numeric = 'C' lc_time = 'C' log_destination = 'stderr' log_line_prefix = '%t ' logging_collector = on maintenance_work_mem = 16MB max_connections = 200 max_fsm_pages = 204800 max_locks_per_transaction = 128 port = 5432 shared_buffers = 500MB vacuum_cost_delay = 100 work_mem = 512MB
On Fri, Jan 08, 2010 at 12:38:46PM -0500, Carlo Stonebanks wrote: >> I thought that post mentioned that the plan >> was one statement in an iteration, and that the cache would have >> been primed by a previous query checking whether there were any rows >> to update. If that was the case, it might be worthwhile to look at >> the entire flow of an iteration. > > This is the only SQL query in the code in question - the rest of the code > manages the looping and commit. The code was copied to PgAdminIII and > values written in for the WHERE clause. In order for me to validate that > rows would have been updated, I had to run a SELECT with the same WHERE > clause in PgAdminIII first to see how many rows would have qualified. But > this was for testing purposes only. The SELECT statement does not exist in > the code. The vast majority of the rows that will be processed will be > updated as this is a backfill to synch the old rows with the values being > filled into new columns now being inserted. > >> Also, if you ever responded with version and configuration >> information, I missed it. > > This is hosted on a new server the client set up so I am waiting for the > exact OS and hardware config. PG Version is PostgreSQL 8.3.6, compiled by > Visual C++ build 1400, OS appears to be Windows 2003 x64 Server. > > More than anything, I am more concerned with the long-term use of the > system. This particular challenge with the 500M row update is one thing, > but I am concerned about the exceptional effort required to do this. Is it > REALLY this exceptional to want to update 500M rows of data in this day and > age? Or is the fact that we are considering dumping and restoring and > dropping indexes, etc to do all an early warning that we don't have a > solution that is scaled to the problem? > > Config data follows (I am assuming commented values which I did not include > are defaulted). > > Carlo > Hi Carlo, It all boils down to resource management and constraints. For small problems relative to the amount of system resources available, little effort is needed to have satisfactory performance. As the problems consume more and more of the total resource capacity, you will need to know more and more in depth about the workings of every piece of the system to wring the most possible performance out of the system. Some of the discussions on this topic have covered a smattering of details that will become increasingly important as your system scales and determine whether or not it will scale. Many times the need for updates on such a massive scale do point to normalization problems. My two cents. Cheers, Ken
Carlo Stonebanks wrote: > This is hosted on a new server the client set up so I am waiting for > the exact OS and hardware config. PG Version is PostgreSQL 8.3.6, > compiled by Visual C++ build 1400, OS appears to be Windows 2003 x64 > Server. > > More than anything, I am more concerned with the long-term use of the > system. This particular challenge with the 500M row update is one > thing, but I am concerned about the exceptional effort required to do > this. Is it REALLY this exceptional to want to update 500M rows of > data in this day and age? Or is the fact that we are considering > dumping and restoring and dropping indexes, etc to do all an early > warning that we don't have a solution that is scaled to the problem? It's certainly not common or easy to handle. If someone told me I had to make that task well perform well and the tools at hand were anything other than a largish UNIX-ish server with a properly designed disk subsystem, I'd tell them it's unlikely to work well. An UPDATE is the most intensive single operation you can do in PostgreSQL; the full lifecycle of executing it requires: -Creating a whole new row -Updating all the indexes to point to the new row -Marking the original row dead -Pruning the original row out of the database once it's no longer visible anywhere (VACUUM) As a rule, if you can't fit a large chunk of the indexes involved in shared_buffers on your system, this is probably going to have terrible performance. And you're on a platform where that's very hard to do, even if there's a lot of RAM around. It sounds like your system spends all its time swapping index blocks in and out of the database buffer cache here. That suggests there's a design problem here either with those indexes (they're too big) or with the provisioned hardware/software combination (needs more RAM, faster disks, or a platform where large amounts of RAM work better). -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
"Carlo Stonebanks" <stonec.register@sympatico.ca> wrote: > In order for me to validate that rows would have been updated, I > had to run a SELECT with the same WHERE clause in PgAdminIII first > to see how many rows would have qualified. But this was for > testing purposes only. The SELECT statement does not exist in the > code. OK, I did misunderstand your earlier post. Got it now, I think. > This is hosted on a new server the client set up so I am waiting > for the exact OS and hardware config. PG Version is PostgreSQL > 8.3.6, compiled by Visual C++ build 1400, OS appears to be Windows > 2003 x64 Server. That might provide more clues, when you get it. > bgwriter_lru_maxpages = 100 With the large database size and the apparent checkpoint-related delays, I would make that more aggressive. Getting dirty pages to the OS cache reduces how much physical I/O needs to happen during checkpoint. We use this on our large databases: bgwriter_lru_maxpages = 1000 bgwriter_lru_multiplier = 4.0 Boosting your checkpoint_completion_target along with or instead of a more aggressive background writer might also help. > max_fsm_pages = 204800 This looks suspiciously low for the size of your database. If you do a VACUUM VERBOSE (for the database), what do the last few lines show? > work_mem = 512MB That's OK only if you are sure you don't have a lot of connections requesting that much RAM at one time, or you could drive yourself into swapping. I hope this helps. -Kevin
>> crank it up more and delay the checkpoints as much as possible during >> these updates. 64 segments is already 1024M. > > We have 425M rows, total table size is 78GB, so we can imagine a worst > case UPDATE write is less than 200 bytes * number of rows specified in > the update (is that logic correct?). There is also the WAL : all these updates need to be logged, which doubles the UPDATE write throughput. Perhaps you're WAL-bound (every 16MB segment needs fsyncing), and tuning of fsync= and wal_buffers, or a faster WAL disk could help ? (I don't remember your config). > Inerestingly, the total index size is 148GB, twice that of the table, > which may be an indication of where the performance bottleneck is. Index updates can create random I/O (suppose you have a btree on a rather random column)...
My client just informed me that new hardware is available for our DB server. . Intel Core 2 Quads Quad . 48 GB RAM . 4 Disk RAID drive (RAID level TBD) I have put the ugly details of what we do with our DB below, as well as the postgres.conf settings. But, to summarize: we have a PostgreSQL 8.3.6 DB with very large tables and the server is always busy serving a constant stream of single-row UPDATEs and INSERTs from parallel automated processes. There are less than 10 users, as the server is devoted to the KB production system. My questions: 1) Which RAID level would you recommend 2) Which Windows OS would you recommend? (currently 2008 x64 Server) 3) If we were to port to a *NIX flavour, which would you recommend? (which support trouble-free PG builds/makes please!) 4) Is this the right PG version for our needs? Thanks, Carlo The details of our use: . The DB hosts is a data warehouse and a knowledgebase (KB) tracking the professional information of 1.3M individuals. . The KB tables related to these 130M individuals are naturally also large . The DB is in a perpetual state of serving TCL-scripted Extract, Transform and Load (ETL) processes . These ETL processes typically run 10 at-a-time (i.e. in parallel) . We would like to run more, but the server appears to be the bottleneck . The ETL write processes are 99% single row UPDATEs or INSERTs. . There are few, if any DELETEs . The ETL source data are "import tables" . The import tables are permanently kept in the data warehouse so that we can trace the original source of any information. . There are 6000+ and counting . The import tables number from dozens to hundreds of thousands of rows. They rarely require more than a pkey index. . Linking the KB to the source import date requires an "audit table" of 500M rows, and counting. . The size of the audit table makes it very difficult to manage, especially if we need to modify the design. . Because we query the audit table different ways to audit the ETL processes decisions, almost every column in the audit table is indexed. . The maximum number of physical users is 10 and these users RARELY perform any kind of write . By contrast, the 10+ ETL processes are writing constantly . We find that internal stats drift, for whatever reason, causing row seq scans instead of index scans. . So far, we have never seen a situation where a seq scan has improved performance, which I would attribute to the size of the tables . We believe our requirements are exceptional, and we would benefit immensely from setting up the PG planner to always favour index-oriented decisions - which seems to contradict everything that PG advice suggests as best practice. Current non-default conf settings are: autovacuum = on autovacuum_analyze_scale_factor = 0.1 autovacuum_analyze_threshold = 250 autovacuum_naptime = 1min autovacuum_vacuum_scale_factor = 0.2 autovacuum_vacuum_threshold = 500 bgwriter_lru_maxpages = 100 checkpoint_segments = 64 checkpoint_warning = 290 datestyle = 'iso, mdy' default_text_search_config = 'pg_catalog.english' lc_messages = 'C' lc_monetary = 'C' lc_numeric = 'C' lc_time = 'C' log_destination = 'stderr' log_line_prefix = '%t ' logging_collector = on maintenance_work_mem = 16MB max_connections = 200 max_fsm_pages = 204800 max_locks_per_transaction = 128 port = 5432 shared_buffers = 500MB vacuum_cost_delay = 100 work_mem = 512MB
Guys, I want to thank you for all of the advice - my client has just made a surprise announcement that he would like to set start from scratch with a new server, so I am afraid that all of this great advice has to be seen in the context of whatever decision is made on that. I am out there, hat-in-hand, looking for advice under the PERFORM post: "New server to improve performance on our large and busy DB - advice?" Thanks again! Carlo "Scott Marlowe" <scott.marlowe@gmail.com> wrote in message news:dcc563d11001071740q572cdae2re410788fe790dbf6@mail.gmail.com... > On Thu, Jan 7, 2010 at 2:48 PM, Carlo Stonebanks > <stonec.register@sympatico.ca> wrote: >> Doing the updates in smaller chunks resolved these apparent freezes - or, >> more specifically, when the application DID freeze, it didn't do it for >> more >> than 30 seconds. In all likelyhood, this is the OS and the DB thrashing. > > It might well be checkpoints. Have you tried cranking up checkpoint > segments to something like 100 or more and seeing how it behaves then? > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
Re: New server to improve performance on our large and busy DB -advice?
From
"Joshua D. Drake"
Date:
On Thu, 14 Jan 2010 14:17:13 -0500, "Carlo Stonebanks" <stonec.register@sympatico.ca> wrote: > My client just informed me that new hardware is available for our DB > server. > > . Intel Core 2 Quads Quad > . 48 GB RAM > . 4 Disk RAID drive (RAID level TBD) > > I have put the ugly details of what we do with our DB below, as well as > the > postgres.conf settings. But, to summarize: we have a PostgreSQL 8.3.6 DB > with very large tables and the server is always busy serving a constant > stream of single-row UPDATEs and INSERTs from parallel automated processes. > > There are less than 10 users, as the server is devoted to the KB > production > system. > > My questions: > > 1) Which RAID level would you recommend 10 > 2) Which Windows OS would you recommend? (currently 2008 x64 Server) If you have to run Windows... that works. > 3) If we were to port to a *NIX flavour, which would you recommend? (which > support trouble-free PG builds/makes please!) Community driven: Debian Stable CentOS 5 Commercial: Ubuntu LTS RHEL 5 > 4) Is this the right PG version for our needs? You want to run at least the latest stable 8.3 series which I believe is 8.3.9. With the imminent release of 8.5 (6 months), it may be time to move to 8.4.2 instead. Joshua D. Drake > > Thanks, > > Carlo > > The details of our use: > > . The DB hosts is a data warehouse and a knowledgebase (KB) tracking the > professional information of 1.3M individuals. > . The KB tables related to these 130M individuals are naturally also large > . The DB is in a perpetual state of serving TCL-scripted Extract, > Transform > and Load (ETL) processes > . These ETL processes typically run 10 at-a-time (i.e. in parallel) > . We would like to run more, but the server appears to be the bottleneck > . The ETL write processes are 99% single row UPDATEs or INSERTs. > . There are few, if any DELETEs > . The ETL source data are "import tables" > . The import tables are permanently kept in the data warehouse so that we > can trace the original source of any information. > . There are 6000+ and counting > . The import tables number from dozens to hundreds of thousands of rows. > They rarely require more than a pkey index. > . Linking the KB to the source import date requires an "audit table" of > 500M > rows, and counting. > . The size of the audit table makes it very difficult to manage, > especially > if we need to modify the design. > . Because we query the audit table different ways to audit the ETL > processes > decisions, almost every column in the audit table is indexed. > . The maximum number of physical users is 10 and these users RARELY > perform > any kind of write > . By contrast, the 10+ ETL processes are writing constantly > . We find that internal stats drift, for whatever reason, causing row seq > scans instead of index scans. > . So far, we have never seen a situation where a seq scan has improved > performance, which I would attribute to the size of the tables > . We believe our requirements are exceptional, and we would benefit > immensely from setting up the PG planner to always favour index-oriented > decisions - which seems to contradict everything that PG advice suggests > as > best practice. > > Current non-default conf settings are: > > autovacuum = on > autovacuum_analyze_scale_factor = 0.1 > autovacuum_analyze_threshold = 250 > autovacuum_naptime = 1min > autovacuum_vacuum_scale_factor = 0.2 > autovacuum_vacuum_threshold = 500 > bgwriter_lru_maxpages = 100 > checkpoint_segments = 64 > checkpoint_warning = 290 > datestyle = 'iso, mdy' > default_text_search_config = 'pg_catalog.english' > lc_messages = 'C' > lc_monetary = 'C' > lc_numeric = 'C' > lc_time = 'C' > log_destination = 'stderr' > log_line_prefix = '%t ' > logging_collector = on > maintenance_work_mem = 16MB > max_connections = 200 > max_fsm_pages = 204800 > max_locks_per_transaction = 128 > port = 5432 > shared_buffers = 500MB > vacuum_cost_delay = 100 > work_mem = 512MB -- PostgreSQL - XMPP: jdrake(at)jabber(dot)postgresql(dot)org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997
Carlo Stonebanks wrote: > Guys, I want to thank you for all of the advice - my client has just > made a surprise announcement that he would like to set start from > scratch with a new server, so I am afraid that all of this great advice > has to be seen in the context of whatever decision is made on that. I am > out there, hat-in-hand, looking for advice under the PERFORM post: "New > server to improve performance on our large and busy DB - advice?" You might start this as a new topic with a relevant title, and reiterate your database requirements. Otherwise it will getsubmerged as just a footnote to your original question. It's really nice to be able to quickly find the new-equipmentdiscussions. Craig
On Thu, Jan 14, 2010 at 12:17 PM, Carlo Stonebanks <stonec.register@sympatico.ca> wrote: > My client just informed me that new hardware is available for our DB server. > > . Intel Core 2 Quads Quad > . 48 GB RAM > . 4 Disk RAID drive (RAID level TBD) > > I have put the ugly details of what we do with our DB below, as well as the > postgres.conf settings. But, to summarize: we have a PostgreSQL 8.3.6 DB > with very large tables and the server is always busy serving a constant > stream of single-row UPDATEs and INSERTs from parallel automated processes. > > There are less than 10 users, as the server is devoted to the KB production > system. > > My questions: > > 1) Which RAID level would you recommend RAID-10 with a battery backed hardware caching controller. > 2) Which Windows OS would you recommend? (currently 2008 x64 Server) That's probably the most stable choice out there for Windows. > 3) If we were to port to a *NIX flavour, which would you recommend? (which > support trouble-free PG builds/makes please!) I'd parrot what Joshua Drake said here. Centos / RHEL / Debian / Ubuntu > 4) Is this the right PG version for our needs? 8.3 is very stable. Update to the latest. 8.4 seems good, but I've had, and still am having, problems with it crashing in production. Not often, maybe once every couple of months, but just enough that I'm not ready to try and use it there yet. And I can't force the same failure in testing, at least not yet. > The details of our use: > > . These ETL processes typically run 10 at-a-time (i.e. in parallel) > . We would like to run more, but the server appears to be the bottleneck > . The ETL write processes are 99% single row UPDATEs or INSERTs. Can you run the ETL processes in such a way that they can do many inserts and updates at once? That would certainly speed things up a bit. > . The size of the audit table makes it very difficult to manage, especially > if we need to modify the design. You might want to look into partitioning / inheritance if that would help. > . Because we query the audit table different ways to audit the ETL processes > decisions, almost every column in the audit table is indexed. This may or may not help. If you're querying it and the part in the where clause referencing this column isn't very selective, and index won't be chosen anyway. If you've got multiple columns in your where clause, the more selective ones will use and index and the rest will get filtered out instead of using an index. Look in pg_stat_user_indexes for indexes that don't get used and drop them unless, of course, they're unique indexes. > . The maximum number of physical users is 10 and these users RARELY perform > any kind of write > . By contrast, the 10+ ETL processes are writing constantly You may be well served by having two servers, one to write to, and a slave that is used by the actual users. Our slony slaves have a much easier time writing out their data than our master database does. > . We find that internal stats drift, for whatever reason, causing row seq > scans instead of index scans. Yeah, this is a known problem on heavily updated tables and recent entries. Cranking up autovacuum a bit can help, but often it requires special treatment, either by adjusting the autovac analyze threshold values for just those tables, or running manual analyzes every couple of minutes. > . So far, we have never seen a situation where a seq scan has improved > performance, which I would attribute to the size of the tables Not so much the size of the tables, as the size of the request. If you were running aggregates across whole large tables, a seq scan would definitely be the way to go. If you're asking for one row, index scan should win. Somewhere between those two, when you get up to hitting some decent percentage of the rows, the switch from index scan to seq scan makes sense, and it's likely happening too early for you. Look at random_page_cost and effective_cache_size for starters. > . We believe our requirements are exceptional, and we would benefit > immensely from setting up the PG planner to always favour index-oriented > decisions - which seems to contradict everything that PG advice suggests as > best practice. See previous comment I made up there ^^^ It's not about always using indexes, it's about giving the planner the information it needs to make the right choice. > Current non-default conf settings are: > > autovacuum = on > autovacuum_analyze_scale_factor = 0.1 You might wanna lower the analyze scale factor if you're having problems with bad query plans on fresh data. > autovacuum_analyze_threshold = 250 > autovacuum_naptime = 1min > autovacuum_vacuum_scale_factor = 0.2 > autovacuum_vacuum_threshold = 500 > bgwriter_lru_maxpages = 100 > checkpoint_segments = 64 > checkpoint_warning = 290 > datestyle = 'iso, mdy' > default_text_search_config = 'pg_catalog.english' > lc_messages = 'C' > lc_monetary = 'C' > lc_numeric = 'C' > lc_time = 'C' > log_destination = 'stderr' > log_line_prefix = '%t ' > logging_collector = on > maintenance_work_mem = 16MB > max_connections = 200 > max_fsm_pages = 204800 The default tends to be low. Run vacuum verbose to see if you're overrunning the max_fsm_pages settings or the max_fsm_relations. > max_locks_per_transaction = 128 > port = 5432 > shared_buffers = 500MB > vacuum_cost_delay = 100 That's REALLY REALLY high. You might want to look at something in the 5 to 20 range. > work_mem = 512MB
My bad - I thought I had, so it has been re-posted with a (v2) disclaimer in the title... like THAT will stop the flaming! <g> Thanks for your patience! "Craig James" <craig_james@emolecules.com> wrote in message news:4B4F8A49.7010906@emolecules.com... > Carlo Stonebanks wrote: >> Guys, I want to thank you for all of the advice - my client has just made >> a surprise announcement that he would like to set start from scratch with >> a new server, so I am afraid that all of this great advice has to be seen >> in the context of whatever decision is made on that. I am out there, >> hat-in-hand, looking for advice under the PERFORM post: "New server to >> improve performance on our large and busy DB - advice?" > > You might start this as a new topic with a relevant title, and reiterate > your database requirements. Otherwise it will get submerged as just a > footnote to your original question. It's really nice to be able to > quickly find the new-equipment discussions. > > Craig > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
Carlo Stonebanks wrote: > 1) Which RAID level would you recommend It looks like you stepped over a critical step, which is "will the server have a good performing RAID card?". Your whole upgrade could underperform if you make a bad mistake on that part. It's really important to nail that down, and to benchmark to prove you got what you expected from your hardware vendor. > 3) If we were to port to a *NIX flavour, which would you recommend? > (which support trouble-free PG builds/makes please!) The only platform I consider close to trouble free as far as the PG builds working without issues are RHEL/CentOS, due to the maturity of the PGDG yum repository and how up to date it's kept. Every time I wander onto another platform I find the lag and care taken in packaging PostgreSQL to be at least a small step down from there. > 4) Is this the right PG version for our needs? 8.4 removes the FSM, which takes away a common source for unexpected performance issues when you overflow max_fsm_pages one day. If you're going to deploy 8.3, you need to be more careful to monitor the whole VACUUM process; it's easier to ignore in 8.4 and still get by OK. As far as general code stability goes, I think it's a wash at this point. You might discover a bug in 8.4 that causes a regression, but I think you're just as likely to run into a situation that 8.3 handles badly that's improved in 8.4. Hard to say which will work out better in a really general way. > . We believe our requirements are exceptional, and we would benefit > immensely from setting up the PG planner to always favour > index-oriented decisions - which seems to contradict everything that > PG advice suggests as best practice. Pretty much everyone thinks their requirements are exceptional. It's funny how infrequently that's actually true. The techniques that favor index-use aren't that unique: collect better stats, set basic parameters correctly, adjust random_page_cost, investigate plans that don't do what you want to figure out why. It's easy to say there's something special about your data rather than follow fundamentals here; I'd urge you to avoid doing that. The odds that the real issue is that you're feeding the optimizer bad data is more likely than most people think, which brings us to: > Current non-default conf settings are: I don't see effective_cache_size listed there. If that's at the default, I wouldn't be surprised that you're seeing sequential scans instead of indexed ones far too often. > max_connections = 200 > work_mem = 512MB This is a frightening combination by the way. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
Re: New server to improve performance on our large and busy DB - advice?
From
"Carlo Stonebanks"
Date:
> Pretty much everyone thinks their requirements are exceptional. It's > funny how infrequently that's actually true. The techniques that favor > index-use aren't that unique: collect better stats, set basic parameters > correctly, adjust random_page_cost, investigate plans that don't do what > you want to figure out why. It's easy to say there's something special > about your data rather than follow fundamentals here; I'd urge you to > avoid doing that. The odds that the real issue is that you're feeding the > optimizer bad data is more likely than most people think, which brings us > to: I understand that. And the answer is usually to go and do and ANALYZE manually (if it isn't this, it will be some dependency on a set-returning stored function we wrote before we could specify the rows and cost). My question is really - why do I need this constant intervention? When we rarely do aggregates, when our queries are (nearly) always single row queries (and very rarely more than 50 rows) out of tables that have hundreds of thousands to millions of rows, what does it take to NOT have to intervene? WHich brings me to your next point: > I don't see effective_cache_size listed there. If that's at the default, > I wouldn't be surprised that you're seeing sequential scans instead of > indexed ones far too often. Nice to know - I suspect someone has been messing around with stuff they don't understand. I do know that after some screwing around they got the server to the point that it wouldn't restart and tried to back out until it would. >> max_connections = 200 >> work_mem = 512MB > This is a frightening combination by the way. Looks like it's connected to the above issue. The real max connection value is 1/10th of that. Thanks Greg! Carlo
On Thu, Jan 14, 2010 at 8:17 PM, Carlo Stonebanks <stonec.register@sympatico.ca> wrote: > . 48 GB RAM > 2) Which Windows OS would you recommend? (currently 2008 x64 Server) There is not a 64-bit windows build now - You would be limited to shared_buffers at about a gigabyte. Choose Linux Greetings Marcin Mańk
Re: New server to improve performance on our large and busy DB - advice?
From
"Carlo Stonebanks"
Date:
Hi Scott, Sorry for the very late reply on this post, but I'd like to follow up. The reason that I took so long to reply was due to this suggestion: <<Run vacuum verbose to see if you're overrunning the max_fsm_pages settings or the max_fsm_relations. >> My first thought was, does he mean against the entire DB? That would take a week! But, since it was recommended, I decided to see what would happen. So, I just ran VACUUM VERBOSE. After five days, it was still vacuuming and the server admin said they needed to bounce the server, which means the command never completed (I kept the log of the progress so far, but don't know if the values you needed would appear at the end. I confess I have no idea how to relate the INFO and DETAIL data coming back with regards to max_fsm_pages settings or the max_fsm_relations. So, now my questions are: 1) Did you really mean you wanted VACUUM VERBOSE to run against the entire DB? 2) Given my previous comments on the size of the DB (and my thinking that this is an exceptionally large and busy DB) were you expecting it to take this long? 3) I took no exceptional measures before running it, I didn't stop the automated import processes, I didn't turn off autovacuum. Would this have accounted for the time it is taking to THAT degree? 4) Any other way to get max_fsm_pages settings and max_fsm_relations? Carlo
On Tue, Jan 19, 2010 at 2:09 PM, Carlo Stonebanks <stonec.register@sympatico.ca> wrote: > Hi Scott, > > Sorry for the very late reply on this post, but I'd like to follow up. The > reason that I took so long to reply was due to this suggestion: > > <<Run vacuum verbose to see if you're > overrunning the max_fsm_pages settings or the max_fsm_relations. >>> > > My first thought was, does he mean against the entire DB? That would take a > week! But, since it was recommended, I decided to see what would happen. So, > I just ran VACUUM VERBOSE. After five days, it was still vacuuming and the > server admin said they needed to bounce the server, which means the command > never completed (I kept the log of the progress so far, but don't know if > the values you needed would appear at the end. I confess I have no idea how > to relate the INFO and DETAIL data coming back with regards to max_fsm_pages > settings or the max_fsm_relations. yeah, the values are at the end. Sounds like your vacuum settings are too non-aggresive. Generally this is the vacuum cost delay being too high. > So, now my questions are: > > 1) Did you really mean you wanted VACUUM VERBOSE to run against the entire > DB? Yes. A whole db at least. However... > 2) Given my previous comments on the size of the DB (and my thinking that > this is an exceptionally large and busy DB) were you expecting it to take > this long? Yes, I was figuring it would be a while. However... > 3) I took no exceptional measures before running it, I didn't stop the > automated import processes, I didn't turn off autovacuum. Would this have > accounted for the time it is taking to THAT degree? Nah, not really. However... > 4) Any other way to get max_fsm_pages settings and max_fsm_relations? Yes! You can run vacuum verbose against the regular old postgres database (or just create one for testing with nothing in it) and you'll still get the fsm usage numbers from that! So, no need to run it against the big db. However, if regular vacuum verbose couldn't finish in a week, then you've likely got vacuum and autovacuum set to be too timid in their operation, and may be getting pretty bloated as we speak. Once the fsm gets too blown out of the water, it's quicker to dump and reload the whole DB than to try and fix it.
Re: New server to improve performance on our large and busy DB - advice?
From
"Carlo Stonebanks"
Date:
> yeah, the values are at the end. Sounds like your vacuum settings are > too non-aggresive. Generally this is the vacuum cost delay being too > high. Of course, I have to ask: what's the down side? > Yes! You can run vacuum verbose against the regular old postgres > database (or just create one for testing with nothing in it) and > you'll still get the fsm usage numbers from that! So, no need to run > it against the big db. However, if regular vacuum verbose couldn't > finish in a week, then you've likely got vacuum and autovacuum set to > be too timid in their operation, and may be getting pretty bloated as > we speak. Once the fsm gets too blown out of the water, it's quicker > to dump and reload the whole DB than to try and fix it. My client reports this is what they actualyl do on a monthly basis. And the numbers are in: >> NOTICE: number of page slots needed (4090224) exceeds max_fsm_pages >> (204800) >> HINT: Consider increasing the configuration parameter "max_fsm_pages" to >> a value over 4090224. Gee, only off by a factor of 20. What happens if I go for this number (once again, what's the down side)? Carlo
Re: New server to improve performance on our large and busy DB - advice?
From
"Kevin Grittner"
Date:
"Carlo Stonebanks" <stonec.register@sympatico.ca> wrote: >> yeah, the values are at the end. Sounds like your vacuum >> settings are too non-aggresive. Generally this is the vacuum >> cost delay being too high. > > Of course, I have to ask: what's the down side? If you make it too aggressive, it could impact throughput or response time. Odds are that the bloat from having it not aggressive enough is currently having a worse impact. >> Once the fsm gets too blown out of the water, it's quicker >> to dump and reload the whole DB than to try and fix it. > > My client reports this is what they actualyl do on a monthly > basis. The probably won't need to do that with proper configuration and vacuum policies. >>> NOTICE: number of page slots needed (4090224) exceeds >>> max_fsm_pages (204800) >>> HINT: Consider increasing the configuration parameter >>> "max_fsm_pages" to a value over 4090224. > > Gee, only off by a factor of 20. What happens if I go for this > number (once again, what's the down side)? It costs six bytes of shared memory per entry. http://www.postgresql.org/docs/8.3/interactive/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-FSM -Kevin
On Wed, Jan 20, 2010 at 3:03 PM, Carlo Stonebanks <stonec.register@sympatico.ca> wrote: >> Yes! You can run vacuum verbose against the regular old postgres >> database (or just create one for testing with nothing in it) and >> you'll still get the fsm usage numbers from that! So, no need to run >> it against the big db. However, if regular vacuum verbose couldn't >> finish in a week, then you've likely got vacuum and autovacuum set to >> be too timid in their operation, and may be getting pretty bloated as >> we speak. Once the fsm gets too blown out of the water, it's quicker >> to dump and reload the whole DB than to try and fix it. > > My client reports this is what they actualyl do on a monthly basis. Something is deeply wrong with your client's vacuuming policies. ...Robert
Scott Marlowe escribió: > On Thu, Jan 14, 2010 at 12:17 PM, Carlo Stonebanks > <stonec.register@sympatico.ca> wrote: > > 4) Is this the right PG version for our needs? > > 8.3 is very stable. Update to the latest. 8.4 seems good, but I've > had, and still am having, problems with it crashing in production. > Not often, maybe once every couple of months, but just enough that I'm > not ready to try and use it there yet. And I can't force the same > failure in testing, at least not yet. uh. Is there a report of the crash somewhere with details, say stack traces and such? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Thu, Jan 21, 2010 at 8:51 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Scott Marlowe escribió: >> On Thu, Jan 14, 2010 at 12:17 PM, Carlo Stonebanks >> <stonec.register@sympatico.ca> wrote: > >> > 4) Is this the right PG version for our needs? >> >> 8.3 is very stable. Update to the latest. 8.4 seems good, but I've >> had, and still am having, problems with it crashing in production. >> Not often, maybe once every couple of months, but just enough that I'm >> not ready to try and use it there yet. And I can't force the same >> failure in testing, at least not yet. > > uh. Is there a report of the crash somewhere with details, say stack > traces and such? No, the only server that does this is in production as our stats db and when it happens it usually gets restarted immediately. It does this about once every two months. Do the PGDG releases have debugging symbols and what not? I'll see about having a stack trace ready to run for the next time it does this.
Scott Marlowe escribió: > On Thu, Jan 21, 2010 at 8:51 AM, Alvaro Herrera > <alvherre@commandprompt.com> wrote: > > Scott Marlowe escribió: > >> On Thu, Jan 14, 2010 at 12:17 PM, Carlo Stonebanks > >> <stonec.register@sympatico.ca> wrote: > > > >> > 4) Is this the right PG version for our needs? > >> > >> 8.3 is very stable. Update to the latest. 8.4 seems good, but I've > >> had, and still am having, problems with it crashing in production. > >> Not often, maybe once every couple of months, but just enough that I'm > >> not ready to try and use it there yet. And I can't force the same > >> failure in testing, at least not yet. > > > > uh. Is there a report of the crash somewhere with details, say stack > > traces and such? > > No, the only server that does this is in production as our stats db > and when it happens it usually gets restarted immediately. It does > this about once every two months. Do the PGDG releases have debugging > symbols and what not? I'll see about having a stack trace ready to > run for the next time it does this. You mean the RPMs? Yes, I think Devrim publishes debuginfo packages which you need to install separately. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Thu, Jan 21, 2010 at 9:44 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Scott Marlowe escribió: >> On Thu, Jan 21, 2010 at 8:51 AM, Alvaro Herrera >> <alvherre@commandprompt.com> wrote: >> > Scott Marlowe escribió: >> >> On Thu, Jan 14, 2010 at 12:17 PM, Carlo Stonebanks >> >> <stonec.register@sympatico.ca> wrote: >> > >> >> > 4) Is this the right PG version for our needs? >> >> >> >> 8.3 is very stable. Update to the latest. 8.4 seems good, but I've >> >> had, and still am having, problems with it crashing in production. >> >> Not often, maybe once every couple of months, but just enough that I'm >> >> not ready to try and use it there yet. And I can't force the same >> >> failure in testing, at least not yet. >> > >> > uh. Is there a report of the crash somewhere with details, say stack >> > traces and such? >> >> No, the only server that does this is in production as our stats db >> and when it happens it usually gets restarted immediately. It does >> this about once every two months. Do the PGDG releases have debugging >> symbols and what not? I'll see about having a stack trace ready to >> run for the next time it does this. > > You mean the RPMs? Yes, I think Devrim publishes debuginfo packages > which you need to install separately. Well crap, this one was built from source, and not with debugging. Gimme a day or so and I'll have it rebuilt with debug and can run a useful backtrace on it.
On Thu, 2010-01-21 at 13:44 -0300, Alvaro Herrera wrote: > I think Devrim publishes debuginfo packages which you need to install > separately. Right. -- Devrim GÜNDÜZ, RHCE Command Prompt - http://www.CommandPrompt.com devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org Twitter: http://twitter.com/devrimgunduz
Attachment
Re: New server to improve performance on our large and busy DB - advice?
From
"Carlo Stonebanks"
Date:
Hi Greg, As a follow up to this suggestion: >> I don't see effective_cache_size listed there. If that's at the default, >> I wouldn't be surprised that you're seeing sequential scans instead of >> indexed ones far too often. I found an article written by you http://www.westnet.com/~gsmith/content/postgresql/pg-5minute.htm and thought this was pretty useful, and especially this comment: <<effective_cache_size should be set to how much memory is leftover for disk caching after taking into account what's used by the operating system, dedicated PostgreSQL memory, and other applications. If it's set too low, indexes may not be used for executing queries the way you'd expect. Setting effective_cache_size to 1/2 of total memory would be a normal conservative setting. You might find a better estimate by looking at your operating system's statistics. On UNIX-like systems, add the free+cached numbers from free or top. On Windows see the "System Cache" in the Windows Task Manager's Performance tab. >> Are these values to look at BEFORE starting PG? If so, how do I relate the values returned to setting the effective_cache_size values? Carlo PS Loved your 1995 era pages. Being a musician, it was great to read your recommendations on how to buy these things called "CD's". I Googled the term, and they appear to be some ancient precursor to MP3s which people actually PAID for. What kind of stone were they engraved on? ;-D
Carlo Stonebanks wrote: > > <<effective_cache_size should be set to how much memory is leftover > for disk caching after taking into account what's used by the > operating system, dedicated PostgreSQL memory, and other applications. > If it's set too low, indexes may not be used for executing queries the > way you'd expect. Setting effective_cache_size to 1/2 of total memory > would be a normal conservative setting. You might find a better > estimate by looking at your operating system's statistics. On > UNIX-like systems, add the free+cached numbers from free or top. On > Windows see the "System Cache" in the Windows Task Manager's > Performance tab. >>> > Are these values to look at BEFORE starting PG? If so, how do I relate > the values returned to setting the effective_cache_size values? > After starting the database. You can set effective_cache_size to a size in megabytes, so basically you'd look at the amount of free cache, maybe round down a bit, and set effective_cache_size to exactly that. It's not super important to get the number right. The point is that the default is going to be a tiny number way smaller than the RAM in your system, and even getting it within a factor of 2 or 3 of reality will radically change some types of query plans. > PS Loved your 1995 era pages. Being a musician, it was great to read > your recommendations on how to buy these things called "CD's". I > Googled the term, and they appear to be some ancient precursor to MP3s > which people actually PAID for. What kind of stone were they engraved > on? ;-D They're plastic, just like the iPod, iPhone, iToilet, or whatever other white plastic Apple products people listen to music during this new era. Since both my CD collection and the stereo I listen to them on are each individually worth more than my car, it's really tough to sell me on all the terrible sounding MP3s I hear nowadays. I'm the guy who can tell you how the LP, regular CD, gold CD, and SACD/DVD-A for albums I like all compare, so dropping below CD quality is right out. If you ever find yourself going "hey, I wish I had six different versions of 'Dark Side of the Moon' around so I could compare the subtle differences in the mastering and mix on each of them", I'm your guy. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com