Thread: Vacuum stats interpreted?
Our largest (by total byte count) table has a nearly-equal flow of data in and out on a daily basis (that is, we add some 40k rows during the day, and then every night expire stuff that is timestamped beyond our cutoff, which results in about 40k deletions). After the deletions, the table gets vacuumed (not FULL): INFO: --Relation public.articles_content-- INFO: Index ix_articles_content_pk: Pages 398; Tuples 180175: Deleted 0. CPU 0.00s/0.08u sec elapsed 0.08 sec. INFO: Removed 122 tuples in 77 pages. CPU 0.00s/0.01u sec elapsed 0.02 sec. INFO: Pages 3886: Changed 0, Empty 0; Tup 180175: Vac 122, Keep 26437, UnUsed 135721. Total CPU 0.02s/0.17u sec elapsed 0.19 sec. INFO: --Relation pg_toast.pg_toast_634643688-- INFO: Index pg_toast_634643688_index: Pages 27156; Tuples 1256923: Deleted 732. CPU 2.32s/0.80u sec elapsed 27.93 sec. INFO: Removed 732 tuples in 250 pages. CPU 0.01s/0.04u sec elapsed 0.05 sec. INFO: Pages 1667633: Changed 71514, Empty 0; Tup 1256923: Vac 732, Keep 172914, UnUsed 5499031. Total CPU 148.07s/12.52u sec elapsed 809.71 sec. INFO: Analyzing public.articles_content The "Keep" and "UnUsed" numbers seem high to me, compared to other tables. Can anyone interpret these and tell me anything about what we could do with this table to make it "look" better? (I'm not convinced that the extra space affects anything but sequential scans, and we don't do that to this table.) If it helps: I have a theory that database connections from our webserver (which is running PHP and mod_perl) are in "idle in transaction" states (in other words, they issue "Commit; Begin;" and then sleep), which causes the tables to retain bulk (nice image, eh?).
Ah, I should have added: we are using PG 7.3.4.
To test your theory about COMMIT; BEGIN; you could check the PostgreSQL log, it is likely to contain a line: WARNING: there is no transaction in progress "Jeff Boes" <jboes@qtm.net> wrote in message news:6b79e0aa2a1c3668937e1ce087c4a819@news.teranews.com... > Our largest (by total byte count) table has a nearly-equal flow of data in and > out on a daily basis (that is, we add some 40k rows during the day, and then > every night expire stuff that is timestamped beyond our cutoff, which results in > about 40k deletions). > > After the deletions, the table gets vacuumed (not FULL): > > INFO: --Relation public.articles_content-- > INFO: Index ix_articles_content_pk: Pages 398; Tuples 180175: Deleted 0. > CPU 0.00s/0.08u sec elapsed 0.08 sec. > INFO: Removed 122 tuples in 77 pages. > CPU 0.00s/0.01u sec elapsed 0.02 sec. > INFO: Pages 3886: Changed 0, Empty 0; Tup 180175: Vac 122, Keep 26437, UnUsed > 135721. > Total CPU 0.02s/0.17u sec elapsed 0.19 sec. > INFO: --Relation pg_toast.pg_toast_634643688-- > INFO: Index pg_toast_634643688_index: Pages 27156; Tuples 1256923: Deleted 732. > CPU 2.32s/0.80u sec elapsed 27.93 sec. > INFO: Removed 732 tuples in 250 pages. > CPU 0.01s/0.04u sec elapsed 0.05 sec. > INFO: Pages 1667633: Changed 71514, Empty 0; Tup 1256923: Vac 732, Keep 172914, > UnUsed 5499031. > Total CPU 148.07s/12.52u sec elapsed 809.71 sec. > INFO: Analyzing public.articles_content > > The "Keep" and "UnUsed" numbers seem high to me, compared to other tables. Can > anyone interpret these and tell me anything about what we could do with this > table to make it "look" better? (I'm not convinced that the extra space affects > anything but sequential scans, and we don't do that to this table.) > > If it helps: I have a theory that database connections from our webserver (which > is running PHP and mod_perl) are in "idle in transaction" states (in other > words, they issue "Commit; Begin;" and then sleep), which causes the tables to > retain bulk (nice image, eh?). >
Jeff Boes <jboes@qtm.net> writes: > The "Keep" and "UnUsed" numbers seem high to me, compared to other tables. Can > anyone interpret these and tell me anything about what we could do with this > table to make it "look" better? "Keep" is the number of rows that are committed dead but had to be kept anyway, because there are open transactions old enough to still potentially see them. The only way to reduce that is to not have old transactions hanging 'round while you vacuum. UnUsed is the number of empty line-pointer slots. At 4 bytes apiece, this would have to vastly exceed the number of live tuples before you should worry much. 7.4 labels these numbers in a hopefully more user-friendly way, btw. regards, tom lane
At some point in time, tgl@sss.pgh.pa.us (Tom Lane) wrote: >Jeff Boes <jboes@qtm.net> writes: >> The "Keep" and "UnUsed" numbers seem high to me, compared to other tables. Can >> anyone interpret these and tell me anything about what we could do with this >> table to make it "look" better? > >"Keep" is the number of rows that are committed dead but had to be kept >anyway, because there are open transactions old enough to still >potentially see them. The only way to reduce that is to not have old >transactions hanging 'round while you vacuum. > >UnUsed is the number of empty line-pointer slots. At 4 bytes apiece, >this would have to vastly exceed the number of live tuples before you >should worry much. For which values of "vastly"? I have a small table (1-2k rows) which has a ratio of UnUsed:Tuples of 50-500. The table in question has a ratio of about 10 or 11:1. For some tables (not this one), we find that it significantly improves performance (of non-indexed queries) to pg_dump and reload the table periodically. I've been asked to try to quantify (from these vacuum numbers) when we can predict that a dump-and-reload would be valuable.
Jeff Boes <jboes@nexcerpt.com> writes: > At some point in time, tgl@sss.pgh.pa.us (Tom Lane) wrote: >> UnUsed is the number of empty line-pointer slots. At 4 bytes apiece, >> this would have to vastly exceed the number of live tuples before you >> should worry much. > For which values of "vastly"? I have a small table (1-2k rows) which has a ratio > of UnUsed:Tuples of 50-500. That sounds like a lot to me too. You should probably VACUUM FULL and then try to increase the frequency of regular vacuums to cut down on the accumulation of deadwood. > The table in question has a ratio of about 10 or 11:1. It did? I saw about 1:1: >> INFO: Pages 3886: Changed 0, Empty 0; Tup 180175: Vac 122, Keep 26437, UnUsed 135721. which is why I didn't feel a need to panic about it. > For some tables (not this one), we find that it significantly improves > performance (of non-indexed queries) to pg_dump and reload the table > periodically. I've been asked to try to quantify (from these vacuum numbers) > when we can predict that a dump-and-reload would be valuable. For non-indexed scans I would think that the fraction of free space (hence, useless I/O) would be the number you are after. VACUUM does not really offer this, but see contrib/pgstattuple. regards, tom lane