Thread: Large DB, several tuning questions: Index sizes, VACUUM, REINDEX, Autovacuum
Large DB, several tuning questions: Index sizes, VACUUM, REINDEX, Autovacuum
From
Andreas Thiel
Date:
Hi All, Maybe some questions are quite newbie ones, and I did try hard to scan all the articles and documentation, but I did not find a satisfying answer. I'm running PostgreSQL 8.3.6 on a 32-Bit Centos 4 machine (which I probably should update to 64 Bit soon) I have some tables which tend to get huge (and will for sure hit the wall of my storage system soon, total DB ~700 GB now): SELECT relfilenode, relpages,reltuples,relname FROM pg_class WHERE relpages > 10000 ORDER BY relpages DESC; relfilenode | relpages | reltuples | relname -------------+----------+-------------+--------------------------------- - 72693 | 51308246 | 4.46436e+09 | result_orig 72711 | 17871658 | 6.15227e+06 | test 73113 | 12240806 | 4.46436e+09 | result_orig_test_id 73112 | 12240806 | 4.46436e+09 | result_orig_prt_id 72717 | 118408 | 6.15241e+06 | test_orig 72775 | 26489 | 6.15241e+06 | test_orig_lt_id 72755 | 19865 | 6.15241e+06 | test_orig_test_id_key 73147 | 16872 | 6.15227e+06 | test_test_id 73146 | 16872 | 6.15227e+06 | test_lt_id I'm going to work on the table size of the largest table (result_orig) itself by eliminating columns, stuffing n Booleans into bit(n)'s, replacing double precision by reals, etc.. By this I should be able to reduce the storage per row to ~1/3 of the bytes currently used. I have the same information stored in an Oracle 10g DB which consumes only 70G data and 2G for indexes. The schema may be better optimized, but for sure there is a table with 4 billion rows inside as well. So it's about 10x smaller in disk space than PgSQL. I wonder why. But still: ### My Issue No. 1: Index Size What really worries me is the size of the two largest indexes (result_orig_test_id, result_orig_prt_id) I'm using. Both are roughly 1/3 of the result_orig table size and each index only b-tree indexes a single bigint column (prt_id, test_id) of result_orig. Roughly every group of 100 rows of result_orig have the same prt_id, roughly every group of 1000-10000 rows have the same test_id. Each of these two cols is a Foreign Key (ON DELETE CASCADE). So my fear is now, even if I can reduce the amount of data per row in result_orig, my indexes will remain as large as before and then dominate disk usage. Is such disk usage for indexes expected? What can I do to optimize? I could not run yet a VACUUM on result_orig, as I hit into max_fsm_pages limit (still trying to adjust that one). I tried REINDEX, it didn't change anything. ### My Issue No. 2: relpages and VACUUM I have another table "test" which is - as starting point - created by INSERTs and then UPDATE'd. It has the same columns and roughly the same number of rows as table test_orig, but consumes 160 times the number of pages. I tried VACUUM on this table but it did not change anything on its relpages count. Maybe this is just because VACUUM without FULL does not re-claim disk space, i.e. relpages stays as it is? I did observe that after VACUUM, a REINDEX on this table did considerably shrink down the size of its indexes (test_test_id, test_lt_id). ### My Issue No 3: VACCUM FULL out of memory I tried to do a VACCUM FULL on the two tables (test, result_orig) mentioned above. In both cases it fails with a very low number on out of memory like this: ERROR: out of memory DETAIL: Failed on request of size 224. I use these kernel settings: kernel.shmmni = 4096 kernel.shmall = 2097152 kernel.shmmax = 2147483648 vm.overcommit_memory = 2 And these postgresql.conf settings: shared_buffers = 512MB # min 128kB or max_connections*16kB temp_buffers = 128MB # min 800kB max_prepared_transactions = 1024 # can be 0 or more work_mem = 16MB # min 64kB maintenance_work_mem = 256MB # min 1MB max_stack_depth = 8MB # min 100kB max_fsm_pages = 70000000 # min max_fsm_relations*16, 6 bytes each max_fsm_relations = 4194304 # min 100, ~70 bytes each #max_files_per_process = 1000 # min 25 #shared_preload_libraries = '' # (change requires restart) What's going wrong here? I know, one should not use VACUUM FULL, but I was curious to see if this would have any impact on relpages count mentioned in Issue 2. ###My Issue No. 4: Autovacuum I have the feeling that Autovacuum is not really running, else why are tables and indexes growing that much, especially "test" table? #----------------------------------------------------------------------- ------- # AUTOVACUUM PARAMETERS #----------------------------------------------------------------------- ------- autovacuum = on # Enable autovacuum subprocess? 'on' log_autovacuum_min_duration = 1000 # -1 disables, 0 logs all actions and autovacuum_max_workers = 3 # max number of autovacuum subprocesses autovacuum_naptime = 1min # time between autovacuum runs autovacuum_vacuum_threshold = 50 # min number of row updates before autovacuum_analyze_threshold = 50 # min number of row updates before autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze autovacuum_freeze_max_age = 200000000 # maximum XID age before forced vacuum autovacuum_vacuum_cost_delay = 20 # default vacuum cost delay for autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for How would I check it is running correctly? I don't see any error messages in syslog from autovacuum. Any help, also on tuning postgresql.conf to this application, is greatly appreciated! Thanks Andy
Re: Large DB, several tuning questions: Index sizes, VACUUM, REINDEX, Autovacuum
From
Andres Freund
Date:
Hi, On Saturday 05 December 2009 00:03:12 Andreas Thiel wrote: > I'm running PostgreSQL 8.3.6 on a 32-Bit Centos 4 machine (which I > probably should update to 64 Bit soon) How much memory? > I'm going to work on the table size of the largest table (result_orig) > itself by eliminating columns, stuffing n Booleans into bit(n)'s, > replacing double precision by reals, etc.. By this I should be able to > reduce the storage per row to ~1/3 of the bytes currently used. That sounds rather ambitous - did you factor in the per row overhead? > I have the same information stored in an Oracle 10g DB which consumes > only 70G data and 2G for indexes. The schema may be better optimized, > but for sure there is a table with 4 billion rows inside as well. So > it's about 10x smaller in disk space than PgSQL. I wonder why. Thats hard to say without seeing the table definition for both. Could you post it? 2GB for indexes sounds rather small - those are btrees? It might also be interesting to look into the freespacemap to see how much empty space there is - there is a contrib module pg_freespacemap for that. You can also check how much dead tuples a 'ANALYZE VERBOSE tablename' sees. > Is such disk usage for indexes expected? What can I do to optimize? I > could not run yet a VACUUM on result_orig, as I hit into max_fsm_pages > limit (still trying to adjust that one). I tried REINDEX, it didn't > change anything. So its quite possible that your relations are heavily bloated - altough if you reindex that shouldnt matter that much. Btw, have you possibly left over some old prepared transactions or an idle in transaction connection? Both can lead to sever bloat. For the former you can check the system table pg_prepared_xact for the latter pg_stat_activity. > ### My Issue No. 2: relpages and VACUUM > I have another table "test" which is - as starting point - created by > INSERTs and then UPDATE'd. It has the same columns and roughly the same > number of rows as table test_orig, but consumes 160 times the number of > pages. I tried VACUUM on this table but it did not change anything on > its relpages count. Maybe this is just because VACUUM without FULL does > not re-claim disk space, i.e. relpages stays as it is? I did observe > that after VACUUM, a REINDEX on this table did considerably shrink down > the size of its indexes (test_test_id, test_lt_id). A normal VACUUM does not move tuples around - it only marks space as free so it can later be filled. (If the free space is trailing it tries to free it if there are no locks preventing it). > ### My Issue No 3: VACCUM FULL out of memory > I tried to do a VACCUM FULL on the two tables (test, result_orig) > mentioned above. In both cases it fails with a very low number on out of > memory like this: > > ERROR: out of memory > DETAIL: Failed on request of size 224. Well, thats the number of memory its trying to allocate, not the amount it has allocated. Normally the postmaster should output some sort of memory map when that happens. Did you get anything like that? > I use these kernel settings: > kernel.shmmni = 4096 > kernel.shmall = 2097152 > kernel.shmmax = 2147483648 > vm.overcommit_memory = 2 > max_stack_depth = 8MB # min 100kB That sounds a bit too high if you count in that libc and consorts may use some stack space as well - although that should be unrelated to the current issue. > max_fsm_pages = 70000000 # min max_fsm_relations*16, 6 > bytes each As a very rough guide you can start with the sum of relpages in pg_class for that one. > max_fsm_relations = 4194304 # min 100, ~70 bytes each That seems kinda high. Do you have multiple millions of relations? It might be related to the oom situation during vacuum full, although it seems rather unlikely. > ###My Issue No. 4: Autovacuum > I have the feeling that Autovacuum is not really running, else why are > tables and indexes growing that much, especially "test" table? You should see notes about autovacuum in the locks. With an autovacuum_vacuum_scale_factor of 0.2 you need 0.002 times the size of a table in changed tuples before autovacuum starts. For a billion thats quite a bit. I found that this setting often is too high. > How would I check it is running correctly? I don't see any error > messages in syslog from autovacuum. You should see messages about it starting in the syslog. Andres
Re: Large DB, several tuning questions: Index sizes, VACUUM, REINDEX, Autovacuum
From
Craig Ringer
Date:
On 5/12/2009 7:03 AM, Andreas Thiel wrote: > Hi All, > > > Maybe some questions are quite newbie ones, and I did try hard to scan > all the articles and documentation, but I did not find a satisfying > answer. > ### My Issue No. 1: Index Size > Is such disk usage for indexes expected? What can I do to optimize? I > could not run yet a VACUUM on result_orig, as I hit into max_fsm_pages > limit You'll like 8.4 then, as you no longer have to play with max_fsm_pages. The fact that you're hitting max_fsm_pages suggests that you are probably going to be encountering table bloat. Of course, to get to 8.4 you're going to have to go through a dump and reload of doom... > ### My Issue No. 2: relpages and VACUUM > I have another table "test" which is - as starting point - created by > INSERTs and then UPDATE'd. It has the same columns and roughly the same > number of rows as table test_orig, but consumes 160 times the number of > pages. I tried VACUUM on this table but it did not change anything on > its relpages count. Maybe this is just because VACUUM without FULL does > not re-claim disk space, i.e. relpages stays as it is? I did observe > that after VACUUM, a REINDEX on this table did considerably shrink down > the size of its indexes (test_test_id, test_lt_id). CLUSTER is often convenient for re-writing a highly bloated table. You'll need enough free disk space to hold the real rows from the table twice, plus the dead space once, while CLUSTER runs. -- Craig Ringer
Re: Large DB, several tuning questions: Index sizes, VACUUM, REINDEX, Autovacuum
From
Greg Smith
Date:
Craig Ringer wrote: >> ### My Issue No. 1: Index Size >> Is such disk usage for indexes expected? What can I do to optimize? I >> could not run yet a VACUUM on result_orig, as I hit into max_fsm_pages >> limit > > You'll like 8.4 then, as you no longer have to play with max_fsm_pages. > The fact that you're hitting max_fsm_pages suggests that you are > probably going to be encountering table bloat. > Of course, to get to 8.4 you're going to have to go through a dump and > reload of doom... Yeah, increasing max_fsm_pages and seeing what VACUUM VERBOSE tells you afterwards is job #1, as all of the information you're getting now is useless if VACUUM is stalled out on a giant task. It should be possible to migrate from 8.3 to 8.4 using pg_migrator rather than doing a dump and reload. I would recommend considering that as soon as possible--your options are either to learn a lot about better VACUUM practice and being diligent to make sure you never exceed it in the future, or to switch to 8.4 and it will take care of itself. You also need to be careful not to let the system run completely out of disk space before doing something about this, because CLUSTER (the only useful way to clean up after a VACUUM mistake of the magnitude you're facing now) requires making a second copy of the live data in the table as its method to clean things up. That option goes away once you're really low on disk space, and if you get backed into that corner by that you'll really be stuck. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
Re: Large DB, several tuning questions: Index sizes, VACUUM, REINDEX, Autovacuum
From
Andres Freund
Date:
Hi Andreas, Could you please properly quote the email? The way you did it is quite unreadable because you always have to guess who wrote what. On Sunday 06 December 2009 17:06:39 Andreas Thiel wrote: > > I'm going to work on the table size of the largest table (result_orig) > > itself by eliminating columns, stuffing n Booleans into bit(n)'s, > > replacing double precision by reals, etc.. By this I should be able to > > reduce the storage per row to ~1/3 of the bytes currently used. > That sounds rather ambitous - did you factor in the per row overhead? > I did now create the new table, I have now 63 instead of 94 bytes/row on > average. So yes you're right I'm about to hit the bottom of the per row > overhead. How did you calculate that? Did you factor in the alignment requirements? The ddl would be helpfull... > Btw, have you possibly left over some old prepared transactions or an > idle in > transaction connection? Both can lead to sever bloat. > For the former you can check the system table pg_prepared_xact for the > latter > pg_stat_activity. > Seems no the case, pg_prepared_xact doesn't even exist. Its pg_prepared_xacts (note the s), sorry my mind played me. > Where would I find that postmaster output? In syslog? There's nothing > visible... Depends on your setup. I have not the slightest clue about centos. If necessary start postmaster directly. > > max_fsm_relations = 4194304 # min 100, ~70 bytes each Have you corrected that value? Andres
Re: Large DB, several tuning questions: Index sizes, VACUUM, REINDEX, Autovacuum
From
Andres Freund
Date:
On Sunday 06 December 2009 19:20:17 Andreas Thiel wrote: > Hi Andres, > > Thanks a lot for your answers. As bottom line I think the answer is I > have to rethink my DB structure. Can't answer that one without knowing much more ;) > > Could you please properly quote the email? The way you did it is quite > > unreadable because you always have to guess who wrote what. > I try to, is it now getting better? My apologies, still trying to adopt > to using Office 07:-) Better, yes. > Well, I know the data types of my columns sum up to 32 bytes right now > (was about 100 before). As I only see a reduction of relpages/reltuples > by 30% not by a factor 3, I assume that the row overhead kicks in. The > data definition of the new table looks like this: > bigint REFERENCES test_orig(test_id) ON DELETE CASCADE > bigint REFERENCES part_orig(prt_id) ON DELETE CASCADE > smallint > bit(16) > real > text (usually empty in most rows) > smallint > I did calculate 32 Bytes per row (if text is empty), but actually > relpages/reltuples is about ~63 bytes. This would result in a per row > overhead of 31 bytes. Would it change anything if I remove the 2 FOREIGN > KEY constraints? If you remove those columns entirely, sure. If you remove only the constraint, no. The row overhead in 8.3/8.4 is 28bytes afaik. You miss two points in your calculation - one is alignment (i.e. a integer will only start at a 4byte boundary) and the other is that for text you need to store the length of the column as well. > > Its pg_prepared_xacts (note the s), sorry my mind played me. > Nothing inside this table as well. (I did also - while trying to improve > postgresql.conf a few days ago - restart the server a couple of times, I > think that would have removed any hanging transactions or prepares, > shouldn't it?) No, prepared transactions do not get removed by restarting. But thats fine then. > > > > > max_fsm_relations = 4194304 # min 100, ~70 bytes fsm_relations is the max number of relations you want to store in the fsm - currently that means you could have 4 mio tables+indexes. > No, but it seems at least VACUUM is now running fine and no longer > complaining about too small number for max_fsm_pages. Do you think if I > reduce those two numbers, I'll have a better chance to run VACUUM FULL? > Currently max_fsm_pages is slightly larger than relpages of my largest > table. I read somewhere, max_fsm_pages should be about 1/2 of the total > number of relpages in a DB, maybe another way to say it should be larger > than the largest table... The largest table does not really have any special influence on the fsm, so I wouldnt count that rule as very good. Its not that easy to calculate the size of the fsm correctly - thats why its gone in 8.4... I know of several instances running with a larger fsm_pages - you could try to reduce the fsm_relations setting - I dont know if there are problems lurking with such a oversized value. I actually doubt that thats related to the oom youre seeing though - whats your "maintenance_work_mem" setting and whats your /proc/sys/vm/overcommit_ratio and how much swap do you have? Andres
Re: Large DB, several tuning questions: Index sizes, VACUUM, REINDEX, Autovacuum
From
Scott Marlowe
Date:
On Sun, Dec 6, 2009 at 12:09 PM, Andres Freund <andres@anarazel.de> wrote: > I know of several instances running with a larger fsm_pages - you could try to > reduce the fsm_relations setting - I dont know if there are problems lurking > with such a oversized value. I run a db with 10M max_fsm_pages and 500k max_fam_relations. We use about 4.5M pages and only 1200 or so relations. But we HAVE many more relations than that, in the 40k range, so the higher number for max relations is to make sure that if all those start getting updated we can track them too.