Thread: Surprising dead_tuple_count from pgstattuple

Surprising dead_tuple_count from pgstattuple

From
Gordon Shannon
Date:
This is an expansion of the question I posed in this thread:

http://postgresql.1045698.n5.nabble.com/Need-help-understanding-vacuum-verbose-output-tp2265895p2266912.html

I am framing the question here in relation to pgstattuple.  Running 8.4.4 on
Centos.

I have a table T with 5,063,463 rows.   It was just restored from a backup,
and there is no other activity in this database.  I ran a vacuum.

pg_stat_user_tables.n_dead_tup (which is really
pg_stat_get_dead_tuples('T'::regclass::oid)) says 0
pgstattuple says dead_tuple_count=0, free_space=1,355,152

1. I delete 10,000 rows.

pg_stat_user_tables.n_live_tup -> 5053463
pg_stat_user_tables.n_dead_tup -> 10000
pgstattuple.dead_tuple_count -> 10000
pgstattuple.free_space -> 1355152

So far, so good. pgstattuple is counting the dead tuples, and not including
those tuples in the free space count.

2. I delete 15,000 more rows.

pg_stat_user_tables.n_live_tup -> 5038463
pg_stat_user_tables.n_dead_tup -> 25000
pgstattuple.dead_tuple_count -> 15000 ??
pgstattuple.free_space -> 1996904 ??

pgstattuple now appears to count the earlier 10K deleted tuples as no longer
dead, but free space.

3. I delete 50,000 more rows.

pg_stat_user_tables.n_live_tup -> 4988463
pg_stat_user_tables.n_dead_tup -> 75000
pgstattuple.dead_tuple_count -> 50022 ??
pgstattuple.free_space -> 2966628 ??

Same thing, pgstattuple appears to "see" only the most recent delete
transaction (but off by 22), and count the prior ones as free.

4. vacuum verbose

vacuum verbose t;
INFO:  vacuuming "public.t"
INFO:  scanned index "t_pkey" to remove 75000 row versions
DETAIL:  CPU 0.01s/0.38u sec elapsed 0.40 sec.
INFO:  "t": removed 75000 row versions in 637 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "t_pkey" now contains 4988463 row versions in 13886 pages
DETAIL:  75000 index row versions were removed.
204 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "t": found 50022 removable, 3696 nonremovable row versions in 668 out
of 51958 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.01s/0.39u sec elapsed 0.40 sec.
VACUUM
Time: 482.771 ms

It seems relevant that vacuum reports the same incorrect number -- 50022 --
as part of its output.  That makes me think that pgstattuple may be using
similar logic to get its dead tuple count.
I wonder if the key to this is that pgstattuple uses
HeapTupleSatisfiesVisibility() to test for deadness.  If so, why would this
call return apparently false positives?  

I know that pgstattuple is meant to be used for debugging only. I have found
pgstatindex to be very helpful in identifying bloat in my indexes.  
Per Tom in the other thread, I now understand that the "found 50022
removable, 3696 nonremovable...." line is referring to the subset of pages
that it scanned looking for dead tuples.

I keep coming back to this, though -- 50,022 seems to be just wrong, or
perhaps simply misleading -- i.e. way too low. 
It's present in the output of vacuum, and the output of pgstattuple.
I'd like to understand what meaning this number has, and, ideally, how I can
use to to detect things like bloat or fragmentation.

Thanks!

-- 
View this message in context:
http://postgresql.1045698.n5.nabble.com/Surprising-dead-tuple-count-from-pgstattuple-tp2266955p2266955.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


Re: Surprising dead_tuple_count from pgstattuple

From
Itagaki Takahiro
Date:
2010/8/7 Gordon Shannon <gordo169@gmail.com>:
> 1. I delete 10,000 rows.
> pgstattuple.dead_tuple_count -> 10000
>
> 2. I delete 15,000 more rows.
> pgstattuple.dead_tuple_count -> 15000 ??
>
> pgstattuple now appears to count the earlier 10K deleted tuples as no longer
> dead, but free space.

I think it's expected behavior that comes from HOT page reclaim.
The second DELETE not only deleted rows but also removed physical
tuples that were deleted in 1. Missing dead rows were pruned by HOT.

-- 
Itagaki Takahiro


Re: Surprising dead_tuple_count from pgstattuple

From
Robert Haas
Date:
On Fri, Aug 6, 2010 at 9:11 PM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:
> 2010/8/7 Gordon Shannon <gordo169@gmail.com>:
>> 1. I delete 10,000 rows.
>> pgstattuple.dead_tuple_count -> 10000
>>
>> 2. I delete 15,000 more rows.
>> pgstattuple.dead_tuple_count -> 15000 ??
>>
>> pgstattuple now appears to count the earlier 10K deleted tuples as no longer
>> dead, but free space.
>
> I think it's expected behavior that comes from HOT page reclaim.
> The second DELETE not only deleted rows but also removed physical
> tuples that were deleted in 1. Missing dead rows were pruned by HOT.

What would have triggered a HOT prune at any point in this operation?
And why would it have reclaimed all of the deleted rows?

My thought would be "is autovacuum running in the background in
between these commands?".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: Surprising dead_tuple_count from pgstattuple

From
Gordon Shannon
Date:

Robert Haas wrote:
> 
> My thought would be "is autovacuum running in the background in
> between these commands?".
> 

That's a good thought, but no, autovacuum_vacuum_scale_factor is set to 0.2,
meaning that over 1 million dead tuples are necessary for autovacuum. 
Besides, if autovacuum had run, I think the pg_stat_user_tables.n_dead_tup
would have reset to zero, as it did after my manual vacuum.

Regarding HOT prune, I never did any updates, so I think there couldn't be
any HOT tuples.  Or does HOT prune do more than that?
-- 
View this message in context:
http://postgresql.1045698.n5.nabble.com/Surprising-dead-tuple-count-from-pgstattuple-tp2266955p2267263.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


Re: Surprising dead_tuple_count from pgstattuple

From
Heikki Linnakangas
Date:
On 07/08/10 07:43, Gordon Shannon wrote:
> Regarding HOT prune, I never did any updates, so I think there couldn't be
> any HOT tuples.  Or does HOT prune do more than that?

Yes, HOT will also prune away DELETEd tuples. It will leave behind a 
dead line pointer, so it won't stop the table from growing if you 
repeatedly delete and insert, but it will slow it down significantly.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Surprising dead_tuple_count from pgstattuple

From
Gordon Shannon
Date:
I think this simple test highlights the question well.

------------------
create temporary table t(x int) with (autovacuum_enabled=off);
insert into t select x from generate_series(1,10000,1)x;
vacuum verbose t;
select dead_tuple_count from pgstattuple('t');    --> 0
delete from t where x <= 100;
select dead_tuple_count from pgstattuple('t');    --> 100
delete from t where x <= 300;
select dead_tuple_count from pgstattuple('t');    --> 200 (not 300)
vacuum verbose t;
vacuum verbose t;
select dead_tuple_count from pgstattuple('t');    --> 0
drop table t;
------------------

When we get to the 2 vacuums at the end, I expect it to vacuum 300 rows, but
it only appears to vacuum 200...

vacuum verbose t;
psql:test:15: INFO:  vacuuming "pg_temp_2.t"
psql:test:15: INFO:  "t": removed 200 row versions in 2 pages
psql:test:15: INFO:  "t": found 200 removable, 9700 nonremovable row
versions in 45 out of 45 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
Time: 69.792 ms

Yet, the next vacuum reports that it found 300 unused item pointers. So they
were all vacuumed by somebody.

psql:test:16: INFO:  vacuuming "pg_temp_2.t"
psql:test:16: INFO:  "t": found 0 removable, 7158 nonremovable row versions
in 33 out of 45 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 300 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
Time: 38.436 ms

So, I guess my real question here is, what happened to the "missing" 100
items?  If it was HOT prune, can anyone summarize what that does?

Thanks!
-- 
View this message in context:
http://postgresql.1045698.n5.nabble.com/Surprising-dead-tuple-count-from-pgstattuple-tp2266955p2267745.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


Re: Surprising dead_tuple_count from pgstattuple

From
"Kevin Grittner"
Date:
Gordon Shannon  wrote:
> If it was HOT prune, can anyone summarize what that does?
Get a copy of the PostgreSQL source, and read this file:
src/backend/access/heap/README.HOT
-Kevin



Re: Surprising dead_tuple_count from pgstattuple

From
Simon Riggs
Date:
On Sat, 2010-08-07 at 16:11 -0700, Gordon Shannon wrote:

> So, I guess my real question here is, what happened to the "missing"
> 100 items?  If it was HOT prune, can anyone summarize what that does?

Itagaki already explained that the second DELETE would have removed the
100 dead rows you consider to be missing.

Any SQL statement that reads a block can do HOT pruning, if the block is
otherwise unlocked.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services



Re: Surprising dead_tuple_count from pgstattuple

From
Robert Haas
Date:
On Mon, Aug 9, 2010 at 1:50 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Sat, 2010-08-07 at 16:11 -0700, Gordon Shannon wrote:
>
>> So, I guess my real question here is, what happened to the "missing"
>> 100 items?  If it was HOT prune, can anyone summarize what that does?
>
> Itagaki already explained that the second DELETE would have removed the
> 100 dead rows you consider to be missing.
>
> Any SQL statement that reads a block can do HOT pruning, if the block is
> otherwise unlocked.

Where does heap_page_prune() get called from in the DELETE path?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: Surprising dead_tuple_count from pgstattuple

From
Heikki Linnakangas
Date:
On 09/08/10 21:21, Robert Haas wrote:
> On Mon, Aug 9, 2010 at 1:50 PM, Simon Riggs<simon@2ndquadrant.com>  wrote:
>> Any SQL statement that reads a block can do HOT pruning, if the block is
>> otherwise unlocked.
>
> Where does heap_page_prune() get called from in the DELETE path?

heapgetpage()

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Surprising dead_tuple_count from pgstattuple

From
Robert Haas
Date:
On Mon, Aug 9, 2010 at 2:23 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 09/08/10 21:21, Robert Haas wrote:
>>
>> On Mon, Aug 9, 2010 at 1:50 PM, Simon Riggs<simon@2ndquadrant.com>  wrote:
>>>
>>> Any SQL statement that reads a block can do HOT pruning, if the block is
>>> otherwise unlocked.
>>
>> Where does heap_page_prune() get called from in the DELETE path?
>
> heapgetpage()

Ah, OK.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: Surprising dead_tuple_count from pgstattuple

From
Itagaki Takahiro
Date:
2010/8/10 Simon Riggs <simon@2ndquadrant.com>:
> Any SQL statement that reads a block can do HOT pruning, if the block is
> otherwise unlocked.

We use the term "HOT" for two features: 1. HOT updates: Avoiding index updates when keys are not modified. 2. HOT
pruning:Removing tuple bodies, that works even for indexed tuples.
 

2 is the point of the case, but ambiguous "HOT" might confuse some people.

-- 
Itagaki Takahiro


Re: Surprising dead_tuple_count from pgstattuple

From
Simon Riggs
Date:
On Tue, 2010-08-10 at 07:43 +0900, Itagaki Takahiro wrote:
> 2010/8/10 Simon Riggs <simon@2ndquadrant.com>:
> > Any SQL statement that reads a block can do HOT pruning, if the block is
> > otherwise unlocked.
> 
> We use the term "HOT" for two features:
>   1. HOT updates: Avoiding index updates when keys are not modified.
>   2. HOT pruning: Removing tuple bodies, that works even for indexed tuples.
> 
> 2 is the point of the case, but ambiguous "HOT" might confuse some people.

Good point. We should say "HOT updates" and "pruning" as separate
techniques. Itagaki invented the pruning technique and deserves much
credit for that.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services



Re: Surprising dead_tuple_count from pgstattuple

From
Gordon Shannon
Date:
After much code reading, testing, and using the extremely handy pageinspect
contrib to look at pages, here's what I believe is happening. I am not
attempting to describe every possible scenario, only this one test path. 
Following my short test scenario above...

- Inserted rows get line pointers with status 1 (LP_NORMAL)

- When I do the 100 row delete, those rows stay status 1, but get their
t_xmax set, indicating they were deleted.

- When I do the next 200 row delete, those rows also get their t_xmax set.  

- As a side-effect to the deletion, the "hot prune" feature kicks in.  The 2
pages in question were processed by a call to heap_page_prune, which set the
line pointers of the previous 100 deleted rows to 3 (LP_DEAD).  Now I have
100 LP_DEAD and 200 LP_NORMAL with xmax set.

- When I do the vacuum, all 300 are "vacuumed" -- line pointer set to status
0 (LP_UNUSED).

- Because my table has no indexes, lazy_scan_heap calls lazy_vacuum_page
directly for each block, and reports the variable tups_vacuumed ("removed
200 row versions in 2 pages").  However, tups_vacuumed is computed without
counting the 100 LP_DEAD tuples, because per the code comment, it thinks to
do so would be double-counting. Perhaps the output should say something
like:
removed 300 row versions (200 were recently deleted and 100 were previously
deleted).

Whatever the phrasing, I don't know why 200 is the most significant number
here, and 300 isn't mentioned at all.

- If my table did have indexes, as in the original test case of this thread,
then the "removed xxx row version in yyy pages" message comes from
lazy_vacuum_heap instead.  However, instead of using tups_vacuumed, this
code reports the actual number of tuples actually set to status 0. 

I would like to respectfully suggest that the vacuum output line "removed xx
row versions in yy pages" should show the same counts regardless of whether
or not there's an index on the table. I would suggest that the value
reported by lazy_vacuum_heap is correct, and is what I would expect to see. 
I think it would be fine if it also reports the breakdown of LP_DEAD vs
LP_NORMAL tuples vacuumed, if that is deemed useful.

Regarding the output of pgstattuple, via the call to
HeapTupleSatisfiesVisibility, it appears that this simply returns true for
these hot-pruned LP_DEAD tuples, skewing the counts accordingly.

Does that make sense?

-- 
View this message in context:
http://postgresql.1045698.n5.nabble.com/Surprising-dead-tuple-count-from-pgstattuple-tp2266955p2471232.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


Re: Surprising dead_tuple_count from pgstattuple

From
Tom Lane
Date:
Gordon Shannon <gordo169@gmail.com> writes:
> - Because my table has no indexes, lazy_scan_heap calls lazy_vacuum_page
> directly for each block, and reports the variable tups_vacuumed ("removed
> 200 row versions in 2 pages").  However, tups_vacuumed is computed without
> counting the 100 LP_DEAD tuples, because per the code comment, it thinks to
> do so would be double-counting. Perhaps the output should say something
> like:

>     removed 300 row versions (200 were recently deleted and 100 were previously
> deleted).

Well, the problem is to tell which LP_DEAD rows are "recently" deleted.
I agree the output leaves something to be desired, but I don't see how
to improve it without tracking a lot more state than we do now.  It's
not clear that it's really worth much effort; how many people look at
VACUUM VERBOSE output at all, let alone closely?

> Whatever the phrasing, I don't know why 200 is the most significant number
> here, and 300 isn't mentioned at all.

IIRC, the reason for choosing to do it that way is that 200, and not
300, is the best indication of the amount of space reclaimed.  LP_DEAD
tuple headers don't take a lot of space.  We're trying to give an idea
of how many "real" tuples got reclaimed during vacuum.

> I would like to respectfully suggest that the vacuum output line "removed xx
> row versions in yy pages" should show the same counts regardless of whether
> or not there's an index on the table.

But in fact the presence of an index does affect the behavior, above and
beyond vacuum's counting or failure to count, because it changes what HOT
updating and HOT pruning will do.  So the initial state that vacuum is
dealing with could well be different.
        regards, tom lane