Thread: Surprising dead_tuple_count from pgstattuple
This is an expansion of the question I posed in this thread: http://postgresql.1045698.n5.nabble.com/Need-help-understanding-vacuum-verbose-output-tp2265895p2266912.html I am framing the question here in relation to pgstattuple. Running 8.4.4 on Centos. I have a table T with 5,063,463 rows. It was just restored from a backup, and there is no other activity in this database. I ran a vacuum. pg_stat_user_tables.n_dead_tup (which is really pg_stat_get_dead_tuples('T'::regclass::oid)) says 0 pgstattuple says dead_tuple_count=0, free_space=1,355,152 1. I delete 10,000 rows. pg_stat_user_tables.n_live_tup -> 5053463 pg_stat_user_tables.n_dead_tup -> 10000 pgstattuple.dead_tuple_count -> 10000 pgstattuple.free_space -> 1355152 So far, so good. pgstattuple is counting the dead tuples, and not including those tuples in the free space count. 2. I delete 15,000 more rows. pg_stat_user_tables.n_live_tup -> 5038463 pg_stat_user_tables.n_dead_tup -> 25000 pgstattuple.dead_tuple_count -> 15000 ?? pgstattuple.free_space -> 1996904 ?? pgstattuple now appears to count the earlier 10K deleted tuples as no longer dead, but free space. 3. I delete 50,000 more rows. pg_stat_user_tables.n_live_tup -> 4988463 pg_stat_user_tables.n_dead_tup -> 75000 pgstattuple.dead_tuple_count -> 50022 ?? pgstattuple.free_space -> 2966628 ?? Same thing, pgstattuple appears to "see" only the most recent delete transaction (but off by 22), and count the prior ones as free. 4. vacuum verbose vacuum verbose t; INFO: vacuuming "public.t" INFO: scanned index "t_pkey" to remove 75000 row versions DETAIL: CPU 0.01s/0.38u sec elapsed 0.40 sec. INFO: "t": removed 75000 row versions in 637 pages DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "t_pkey" now contains 4988463 row versions in 13886 pages DETAIL: 75000 index row versions were removed. 204 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "t": found 50022 removable, 3696 nonremovable row versions in 668 out of 51958 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.01s/0.39u sec elapsed 0.40 sec. VACUUM Time: 482.771 ms It seems relevant that vacuum reports the same incorrect number -- 50022 -- as part of its output. That makes me think that pgstattuple may be using similar logic to get its dead tuple count. I wonder if the key to this is that pgstattuple uses HeapTupleSatisfiesVisibility() to test for deadness. If so, why would this call return apparently false positives? I know that pgstattuple is meant to be used for debugging only. I have found pgstatindex to be very helpful in identifying bloat in my indexes. Per Tom in the other thread, I now understand that the "found 50022 removable, 3696 nonremovable...." line is referring to the subset of pages that it scanned looking for dead tuples. I keep coming back to this, though -- 50,022 seems to be just wrong, or perhaps simply misleading -- i.e. way too low. It's present in the output of vacuum, and the output of pgstattuple. I'd like to understand what meaning this number has, and, ideally, how I can use to to detect things like bloat or fragmentation. Thanks! -- View this message in context: http://postgresql.1045698.n5.nabble.com/Surprising-dead-tuple-count-from-pgstattuple-tp2266955p2266955.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
2010/8/7 Gordon Shannon <gordo169@gmail.com>: > 1. I delete 10,000 rows. > pgstattuple.dead_tuple_count -> 10000 > > 2. I delete 15,000 more rows. > pgstattuple.dead_tuple_count -> 15000 ?? > > pgstattuple now appears to count the earlier 10K deleted tuples as no longer > dead, but free space. I think it's expected behavior that comes from HOT page reclaim. The second DELETE not only deleted rows but also removed physical tuples that were deleted in 1. Missing dead rows were pruned by HOT. -- Itagaki Takahiro
On Fri, Aug 6, 2010 at 9:11 PM, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > 2010/8/7 Gordon Shannon <gordo169@gmail.com>: >> 1. I delete 10,000 rows. >> pgstattuple.dead_tuple_count -> 10000 >> >> 2. I delete 15,000 more rows. >> pgstattuple.dead_tuple_count -> 15000 ?? >> >> pgstattuple now appears to count the earlier 10K deleted tuples as no longer >> dead, but free space. > > I think it's expected behavior that comes from HOT page reclaim. > The second DELETE not only deleted rows but also removed physical > tuples that were deleted in 1. Missing dead rows were pruned by HOT. What would have triggered a HOT prune at any point in this operation? And why would it have reclaimed all of the deleted rows? My thought would be "is autovacuum running in the background in between these commands?". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas wrote: > > My thought would be "is autovacuum running in the background in > between these commands?". > That's a good thought, but no, autovacuum_vacuum_scale_factor is set to 0.2, meaning that over 1 million dead tuples are necessary for autovacuum. Besides, if autovacuum had run, I think the pg_stat_user_tables.n_dead_tup would have reset to zero, as it did after my manual vacuum. Regarding HOT prune, I never did any updates, so I think there couldn't be any HOT tuples. Or does HOT prune do more than that? -- View this message in context: http://postgresql.1045698.n5.nabble.com/Surprising-dead-tuple-count-from-pgstattuple-tp2266955p2267263.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
On 07/08/10 07:43, Gordon Shannon wrote: > Regarding HOT prune, I never did any updates, so I think there couldn't be > any HOT tuples. Or does HOT prune do more than that? Yes, HOT will also prune away DELETEd tuples. It will leave behind a dead line pointer, so it won't stop the table from growing if you repeatedly delete and insert, but it will slow it down significantly. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
I think this simple test highlights the question well. ------------------ create temporary table t(x int) with (autovacuum_enabled=off); insert into t select x from generate_series(1,10000,1)x; vacuum verbose t; select dead_tuple_count from pgstattuple('t'); --> 0 delete from t where x <= 100; select dead_tuple_count from pgstattuple('t'); --> 100 delete from t where x <= 300; select dead_tuple_count from pgstattuple('t'); --> 200 (not 300) vacuum verbose t; vacuum verbose t; select dead_tuple_count from pgstattuple('t'); --> 0 drop table t; ------------------ When we get to the 2 vacuums at the end, I expect it to vacuum 300 rows, but it only appears to vacuum 200... vacuum verbose t; psql:test:15: INFO: vacuuming "pg_temp_2.t" psql:test:15: INFO: "t": removed 200 row versions in 2 pages psql:test:15: INFO: "t": found 200 removable, 9700 nonremovable row versions in 45 out of 45 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM Time: 69.792 ms Yet, the next vacuum reports that it found 300 unused item pointers. So they were all vacuumed by somebody. psql:test:16: INFO: vacuuming "pg_temp_2.t" psql:test:16: INFO: "t": found 0 removable, 7158 nonremovable row versions in 33 out of 45 pages DETAIL: 0 dead row versions cannot be removed yet. There were 300 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM Time: 38.436 ms So, I guess my real question here is, what happened to the "missing" 100 items? If it was HOT prune, can anyone summarize what that does? Thanks! -- View this message in context: http://postgresql.1045698.n5.nabble.com/Surprising-dead-tuple-count-from-pgstattuple-tp2266955p2267745.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Gordon Shannon wrote: > If it was HOT prune, can anyone summarize what that does? Get a copy of the PostgreSQL source, and read this file: src/backend/access/heap/README.HOT -Kevin
On Sat, 2010-08-07 at 16:11 -0700, Gordon Shannon wrote: > So, I guess my real question here is, what happened to the "missing" > 100 items? If it was HOT prune, can anyone summarize what that does? Itagaki already explained that the second DELETE would have removed the 100 dead rows you consider to be missing. Any SQL statement that reads a block can do HOT pruning, if the block is otherwise unlocked. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services
On Mon, Aug 9, 2010 at 1:50 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Sat, 2010-08-07 at 16:11 -0700, Gordon Shannon wrote: > >> So, I guess my real question here is, what happened to the "missing" >> 100 items? If it was HOT prune, can anyone summarize what that does? > > Itagaki already explained that the second DELETE would have removed the > 100 dead rows you consider to be missing. > > Any SQL statement that reads a block can do HOT pruning, if the block is > otherwise unlocked. Where does heap_page_prune() get called from in the DELETE path? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On 09/08/10 21:21, Robert Haas wrote: > On Mon, Aug 9, 2010 at 1:50 PM, Simon Riggs<simon@2ndquadrant.com> wrote: >> Any SQL statement that reads a block can do HOT pruning, if the block is >> otherwise unlocked. > > Where does heap_page_prune() get called from in the DELETE path? heapgetpage() -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Mon, Aug 9, 2010 at 2:23 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 09/08/10 21:21, Robert Haas wrote: >> >> On Mon, Aug 9, 2010 at 1:50 PM, Simon Riggs<simon@2ndquadrant.com> wrote: >>> >>> Any SQL statement that reads a block can do HOT pruning, if the block is >>> otherwise unlocked. >> >> Where does heap_page_prune() get called from in the DELETE path? > > heapgetpage() Ah, OK. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
2010/8/10 Simon Riggs <simon@2ndquadrant.com>: > Any SQL statement that reads a block can do HOT pruning, if the block is > otherwise unlocked. We use the term "HOT" for two features: 1. HOT updates: Avoiding index updates when keys are not modified. 2. HOT pruning:Removing tuple bodies, that works even for indexed tuples. 2 is the point of the case, but ambiguous "HOT" might confuse some people. -- Itagaki Takahiro
On Tue, 2010-08-10 at 07:43 +0900, Itagaki Takahiro wrote: > 2010/8/10 Simon Riggs <simon@2ndquadrant.com>: > > Any SQL statement that reads a block can do HOT pruning, if the block is > > otherwise unlocked. > > We use the term "HOT" for two features: > 1. HOT updates: Avoiding index updates when keys are not modified. > 2. HOT pruning: Removing tuple bodies, that works even for indexed tuples. > > 2 is the point of the case, but ambiguous "HOT" might confuse some people. Good point. We should say "HOT updates" and "pruning" as separate techniques. Itagaki invented the pruning technique and deserves much credit for that. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services
After much code reading, testing, and using the extremely handy pageinspect contrib to look at pages, here's what I believe is happening. I am not attempting to describe every possible scenario, only this one test path. Following my short test scenario above... - Inserted rows get line pointers with status 1 (LP_NORMAL) - When I do the 100 row delete, those rows stay status 1, but get their t_xmax set, indicating they were deleted. - When I do the next 200 row delete, those rows also get their t_xmax set. - As a side-effect to the deletion, the "hot prune" feature kicks in. The 2 pages in question were processed by a call to heap_page_prune, which set the line pointers of the previous 100 deleted rows to 3 (LP_DEAD). Now I have 100 LP_DEAD and 200 LP_NORMAL with xmax set. - When I do the vacuum, all 300 are "vacuumed" -- line pointer set to status 0 (LP_UNUSED). - Because my table has no indexes, lazy_scan_heap calls lazy_vacuum_page directly for each block, and reports the variable tups_vacuumed ("removed 200 row versions in 2 pages"). However, tups_vacuumed is computed without counting the 100 LP_DEAD tuples, because per the code comment, it thinks to do so would be double-counting. Perhaps the output should say something like: removed 300 row versions (200 were recently deleted and 100 were previously deleted). Whatever the phrasing, I don't know why 200 is the most significant number here, and 300 isn't mentioned at all. - If my table did have indexes, as in the original test case of this thread, then the "removed xxx row version in yyy pages" message comes from lazy_vacuum_heap instead. However, instead of using tups_vacuumed, this code reports the actual number of tuples actually set to status 0. I would like to respectfully suggest that the vacuum output line "removed xx row versions in yy pages" should show the same counts regardless of whether or not there's an index on the table. I would suggest that the value reported by lazy_vacuum_heap is correct, and is what I would expect to see. I think it would be fine if it also reports the breakdown of LP_DEAD vs LP_NORMAL tuples vacuumed, if that is deemed useful. Regarding the output of pgstattuple, via the call to HeapTupleSatisfiesVisibility, it appears that this simply returns true for these hot-pruned LP_DEAD tuples, skewing the counts accordingly. Does that make sense? -- View this message in context: http://postgresql.1045698.n5.nabble.com/Surprising-dead-tuple-count-from-pgstattuple-tp2266955p2471232.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Gordon Shannon <gordo169@gmail.com> writes: > - Because my table has no indexes, lazy_scan_heap calls lazy_vacuum_page > directly for each block, and reports the variable tups_vacuumed ("removed > 200 row versions in 2 pages"). However, tups_vacuumed is computed without > counting the 100 LP_DEAD tuples, because per the code comment, it thinks to > do so would be double-counting. Perhaps the output should say something > like: > removed 300 row versions (200 were recently deleted and 100 were previously > deleted). Well, the problem is to tell which LP_DEAD rows are "recently" deleted. I agree the output leaves something to be desired, but I don't see how to improve it without tracking a lot more state than we do now. It's not clear that it's really worth much effort; how many people look at VACUUM VERBOSE output at all, let alone closely? > Whatever the phrasing, I don't know why 200 is the most significant number > here, and 300 isn't mentioned at all. IIRC, the reason for choosing to do it that way is that 200, and not 300, is the best indication of the amount of space reclaimed. LP_DEAD tuple headers don't take a lot of space. We're trying to give an idea of how many "real" tuples got reclaimed during vacuum. > I would like to respectfully suggest that the vacuum output line "removed xx > row versions in yy pages" should show the same counts regardless of whether > or not there's an index on the table. But in fact the presence of an index does affect the behavior, above and beyond vacuum's counting or failure to count, because it changes what HOT updating and HOT pruning will do. So the initial state that vacuum is dealing with could well be different. regards, tom lane