Thread: Re: Count and log pages set all-frozen by vacuum

Re: Count and log pages set all-frozen by vacuum

From
Alastair Turner
Date:
Hi Melanie

On Wed, 30 Oct 2024 at 21:42, Melanie Plageman <melanieplageman@gmail.com> wrote:
...
The number of pages set all-frozen in the visibility map by a given
vacuum can be useful when analyzing which normal vacuums reduce the
number of pages future aggressive vacuum need to scan.

Also, empty pages that are set all-frozen in the VM don't show up in
the count of pages with newly frozen tuples. When making sense of the
result of visibilitymap_summary() for a relation, it's helpful to know
how many pages were set all-frozen in the VM by each vacuum.

I agree that this data would be useful for analysing the impact of vacuum operations.

The values returned in a case pages are removed (cases where the empty pages are at the end of the table) are a bit confusing.

In an example similar to yours, but with a normal vacuum operation, since that seems to be the most useful case for this feature:

CREATE TABLE the_table (first int, second int) WITH (autovacuum_enabled = false);
INSERT INTO the_table SELECT generate_series(1,1000), 1;
DELETE FROM the_table WHERE first > 50;
VACUUM (VERBOSE) the_table;

pages: 4 removed, 1 remain, 5 scanned (100.00% of total)
tuples: 950 removed, 50 remain, 0 are dead but not yet removable
removable cutoff: 763, which was 1 XIDs old when operation ended
new relfrozenxid: 761, which is 1 XIDs ahead of previous value
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen. 4 pages set all-frozen in the VM

It looks like 4 pages out of 1 are set all-frozen. So there are -3 to scan for the next aggressive vacuum? The four pages which were set to all frozen are the same four which have been removed from the end of the table.

For an example where the empty pages which are marked all frozen are at the start of the table (deleting values < 950 in the example), the results are more intuitive

pages: 0 removed, 5 remain, 5 scanned (100.00% of total)
tuples: 949 removed, 51 remain, 0 are dead but not yet removable
removable cutoff: 768, which was 0 XIDs old when operation ended
new relfrozenxid: 766, which is 1 XIDs ahead of previous value
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen. 4 pages set all-frozen in the VM

Can the pages which are removed from the end of the table not be counted towards those set all frozen to make the arithmetic a bit more intuitive for this edge case?

Thanks
Alastair

Re: Count and log pages set all-frozen by vacuum

From
Melanie Plageman
Date:
Thanks for taking a look, Alastair!

On Thu, Oct 31, 2024 at 5:47 AM Alastair Turner <minion@decodable.me> wrote:
>
> The values returned in a case pages are removed (cases where the empty pages are at the end of the table) are a bit
confusing.
>
> In an example similar to yours, but with a normal vacuum operation, since that seems to be the most useful case for
thisfeature: 
>
> CREATE TABLE the_table (first int, second int) WITH (autovacuum_enabled = false);
> INSERT INTO the_table SELECT generate_series(1,1000), 1;
> DELETE FROM the_table WHERE first > 50;
> VACUUM (VERBOSE) the_table;
>
> pages: 4 removed, 1 remain, 5 scanned (100.00% of total)
> tuples: 950 removed, 50 remain, 0 are dead but not yet removable
> removable cutoff: 763, which was 1 XIDs old when operation ended
> new relfrozenxid: 761, which is 1 XIDs ahead of previous value
> frozen: 0 pages from table (0.00% of total) had 0 tuples frozen. 4 pages set all-frozen in the VM
>
> It looks like 4 pages out of 1 are set all-frozen. So there are -3 to scan for the next aggressive vacuum? The four
pageswhich were set to all frozen are the same four which have been removed from the end of the table. 
>
> For an example where the empty pages which are marked all frozen are at the start of the table (deleting values < 950
inthe example), the results are more intuitive 
>
> pages: 0 removed, 5 remain, 5 scanned (100.00% of total)
> tuples: 949 removed, 51 remain, 0 are dead but not yet removable
> removable cutoff: 768, which was 0 XIDs old when operation ended
> new relfrozenxid: 766, which is 1 XIDs ahead of previous value
> frozen: 0 pages from table (0.00% of total) had 0 tuples frozen. 4 pages set all-frozen in the VM
>
> Can the pages which are removed from the end of the table not be counted towards those set all frozen to make the
arithmetica bit more intuitive for this edge case? 

This is a good point. It could be confusing to see that 1 page remains
but 4 were set all-frozen in the VM.
From the perspective of the code, this is because each page is set
all-frozen/all-visible in the VM after it is pruned or vacuumed.
Truncating of the end of the table happens at the end of vacuum --
after all pages have been processed. So, these pages are set
all-frozen in the VM.

I actually don't see a good way that we could accurately decrement the
count. We have LVRelState->removed_pages but we have no idea which of
those pages are all-frozen. We could have
visibilitymap_prepare_truncate() count and return to
RelationTruncate() how many of the truncated pages were all-frozen.
But we have no way of knowing how many of those pages were newly
frozen by this vacuum.

And if we try to track it from the other direction, when freezing
pages, we would have to keep track of all the block numbers of pages
in the relation that were empty and set frozen and then when
truncating the relation find the overlap. That sounds hard and
expensive.

It seems a better solution would be to find a way to document it or
phrase it clearly in the log. It is true that these pages were set
all-frozen in the VM. It is just that some of them were later removed.
And we don't know which ones those are. Is there a way to make this
clear?
And, if not, is it worse than not having the feature at all?

- Melanie



Re: Count and log pages set all-frozen by vacuum

From
Peter Geoghegan
Date:
On Thu, Oct 31, 2024 at 10:22 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> It seems a better solution would be to find a way to document it or
> phrase it clearly in the log. It is true that these pages were set
> all-frozen in the VM. It is just that some of them were later removed.
> And we don't know which ones those are. Is there a way to make this
> clear?

Probably not, but I don't think that it's worth worrying about. ISTM
that it's inevitable that somebody might get confused whenever we
expose implementation details such as these. This particular example
doesn't seem particularly concerning to me.

Fundamentally, the information that you're showing is a precisely
accurate account of the work performed by VACUUM. If somebody is
bothered by the fact that we're setting VM bits for pages that just
get truncated anyway, then they're bothered by the reality of what
VACUUM does -- and not by the instrumentation itself.

Why not just reuse visibilitymap_count for this (and have it count the
number of all-frozen pages when called by heap_vacuum_rel)? That'll
change the nature of the information shown, but that might actually
make it slightly more useful.

--
Peter Geoghegan



Re: Count and log pages set all-frozen by vacuum

From
Melanie Plageman
Date:
On Thu, Oct 31, 2024 at 11:15 AM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Thu, Oct 31, 2024 at 10:22 AM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> > It seems a better solution would be to find a way to document it or
> > phrase it clearly in the log. It is true that these pages were set
> > all-frozen in the VM. It is just that some of them were later removed.
> > And we don't know which ones those are. Is there a way to make this
> > clear?
>
> Probably not, but I don't think that it's worth worrying about. ISTM
> that it's inevitable that somebody might get confused whenever we
> expose implementation details such as these. This particular example
> doesn't seem particularly concerning to me.
>
> Fundamentally, the information that you're showing is a precisely
> accurate account of the work performed by VACUUM. If somebody is
> bothered by the fact that we're setting VM bits for pages that just
> get truncated anyway, then they're bothered by the reality of what
> VACUUM does -- and not by the instrumentation itself.

Makes sense to me. Though, I'm looking at it as a developer.

> Why not just reuse visibilitymap_count for this (and have it count the
> number of all-frozen pages when called by heap_vacuum_rel)? That'll
> change the nature of the information shown, but that might actually
> make it slightly more useful.

I'm biased because I want the count of pages newly set all-frozen in
the VM for another patch. You think exposing the total number of
all-frozen pages at the end of the vacuum is more useful to users?

- Melanie



Re: Count and log pages set all-frozen by vacuum

From
Peter Geoghegan
Date:
On Thu, Oct 31, 2024 at 11:26 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> I'm biased because I want the count of pages newly set all-frozen in
> the VM for another patch. You think exposing the total number of
> all-frozen pages at the end of the vacuum is more useful to users?

The emphasis on the work that one particular VACUUM operation
performed doesn't seem like the most relevant thing to users (I get
why you'd care about it in the context of your work, though). What
matters to users is that the overall picture over time is one where
VACUUM doesn't leave an excessive number of pages
not-all-frozen-in-VM.

What if we're just setting the same few pages all-frozen, again and
again? And what about normal (non-aggressive) VACUUMs that effectively
*increase* the number of pages future aggressive VACUUMs need to scan?
As you well know, by setting some pages all-visible (not all-frozen),
VACUUM essentially guarantees that those same pages can only get
frozen by future aggressive VACUUMs. All these factors seem to argue
for using visibilitymap_count for this (which is not to say that I am
opposed to instrumented pages newly marked all-frozen in the VM, if it
makes sense as part of some much larger project).

--
Peter Geoghegan



Re: Count and log pages set all-frozen by vacuum

From
Robert Haas
Date:
On Thu, Oct 31, 2024 at 11:15 AM Peter Geoghegan <pg@bowt.ie> wrote:
> Probably not, but I don't think that it's worth worrying about. ISTM
> that it's inevitable that somebody might get confused whenever we
> expose implementation details such as these. This particular example
> doesn't seem particularly concerning to me.

+1. We could possibly make this less confusing by reworking the output
so that we first talk about what the vacuuming did in one set of log
lines and then talk about what the truncation did afterward. But
that's a lot of work, and I don't feel like it's "must do" work.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Count and log pages set all-frozen by vacuum

From
Alastair Turner
Date:
On Thu, 31 Oct 2024 at 15:26, Melanie Plageman <melanieplageman@gmail.com> wrote:
On Thu, Oct 31, 2024 at 11:15 AM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Thu, Oct 31, 2024 at 10:22 AM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> > It seems a better solution would be to find a way to document it or
> > phrase it clearly in the log. It is true that these pages were set
> > all-frozen in the VM. It is just that some of them were later removed.
> > And we don't know which ones those are. Is there a way to make this
> > clear?
>
> Probably not, but I don't think that it's worth worrying about. ISTM
> that it's inevitable that somebody might get confused whenever we
> expose implementation details such as these. This particular example
> doesn't seem particularly concerning to me.
>
> Fundamentally, the information that you're showing is a precisely
> accurate account of the work performed by VACUUM. If somebody is
> bothered by the fact that we're setting VM bits for pages that just
> get truncated anyway, then they're bothered by the reality of what
> VACUUM does -- and not by the instrumentation itself.

Makes sense to me. Though, I'm looking at it as a developer.

For user consumption, to reduce the number of puzzled questions, I'd suggest adding a line to the log output of the form

visibility map: %u pages set all frozen, up to %u may have been removed from the table

rather than appending the info to the frozen: line of the output. By spelling visibility map out in full it gives the curious user something specific enough to look up. It also at least alerts the user to the fact that the number can't just be subtracted from a total.

Re: Count and log pages set all-frozen by vacuum

From
Robert Haas
Date:
On Thu, Oct 31, 2024 at 11:49 AM Peter Geoghegan <pg@bowt.ie> wrote:
> The emphasis on the work that one particular VACUUM operation
> performed doesn't seem like the most relevant thing to users (I get
> why you'd care about it in the context of your work, though). What
> matters to users is that the overall picture over time is one where
> VACUUM doesn't leave an excessive number of pages
> not-all-frozen-in-VM.

I don't see it quite the same way. I agree that what users are really
concerned about is the excessive number of unfrozen pages in the VM.
But that's not the question here. The question is what should
log_autovacuum_min_duration log. And I think it should log what the
vacuum itself did, not what the state of the table ended up being
around the time the vacuum ended.

And I think there is certainly a use case for knowing how much work of
each particular kind vacuum did. You might for example be trying to
judge whether a particular vacuum was useless. Knowing the cumulative
state of the table around the time the vacuum finished doesn't help
you figure that out; a count of how much work the vacuum itself did
does.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Count and log pages set all-frozen by vacuum

From
Peter Geoghegan
Date:
On Thu, Oct 31, 2024 at 12:02 PM Robert Haas <robertmhaas@gmail.com> wrote:
> I don't see it quite the same way. I agree that what users are really
> concerned about is the excessive number of unfrozen pages in the VM.
> But that's not the question here. The question is what should
> log_autovacuum_min_duration log. And I think it should log what the
> vacuum itself did, not what the state of the table ended up being
> around the time the vacuum ended.

I don't fundamentally disagree that the actual work performed by
VACUUM could also be useful. It's a question of emphasis.

FWIW I do disagree with the principle that log_autovacuum_min_duration
should only log things that are work performed by VACUUM. While most
things that it reports on currently do work that way, that in itself
doesn't seem like it should preclude reporting on visibilitymap_count
now.

--
Peter Geoghegan



Re: Count and log pages set all-frozen by vacuum

From
Melanie Plageman
Date:
On Thu, Oct 31, 2024 at 11:56 AM Alastair Turner <minion@decodable.me> wrote:
>
> For user consumption, to reduce the number of puzzled questions, I'd suggest adding a line to the log output of the
form
>
> visibility map: %u pages set all frozen, up to %u may have been removed from the table
>
> rather than appending the info to the frozen: line of the output. By spelling visibility map out in full it gives the
curioususer something specific enough to look up. It also at least alerts the user to the fact that the number can't
justbe subtracted from a total. 

Would it also be useful to have the number set all-visible? It seems
like if we added a new line prefixed with visibility map, it ought to
include all-visible and all-frozen then.
Something like this:
 visibility map: %u pages set all-visible, %u pages set all-frozen.

I find it more confusing to say "up to X may have been removed from
the table." It's unclear what that means -- especially since we
already have "pages removed" in another part of the log message.

We actually could call visibilitymap_count() at the beginning of the
vacuum and then log the difference between that and the results of
calling it after finishing the vacuum. We currently call it after
truncating the table anyway. That won't tell you how many pages were
set all-frozen by this vacuum, as pages could have been unfrozen by
DML that occurred after the page was vacuumed. It might be useful in
addition to the line about the visibility map.

This is somewhat in conflict with Robert and Peter's points about how
autovacuum logging should be about what this vacuum did. But, we do
have lines that talk about the before and after values:

new relfrozenxid: 748, which is 3 XIDs ahead of previous value

So, we could do something like:
visibility map before: %u pages all-visible, %u pages all-frozen
visibility map after: %u pages all-visible, %u pages all-frozen
or
visibility map after: %u pages all-visible (%u more than before), %u
pages all-frozen (%u more than before)

I still prefer adding how many pages were set all-frozen by this vacuum, though.

- Melanie



Re: Count and log pages set all-frozen by vacuum

From
Alastair Turner
Date:
On Thu, 31 Oct 2024 at 18:51, Melanie Plageman <melanieplageman@gmail.com> wrote:

Would it also be useful to have the number set all-visible? It seems
like if we added a new line prefixed with visibility map, it ought to
include all-visible and all-frozen then.
Something like this:
 visibility map: %u pages set all-visible, %u pages set all-frozen.

I find it more confusing to say "up to X may have been removed from
the table." It's unclear what that means -- especially since we
already have "pages removed" in another part of the log message.
 
Yeah, on looking at it again, that does seem to make things worse.

We actually could call visibilitymap_count() at the beginning of the
vacuum and then log the difference between that and the results of
calling it after finishing the vacuum. We currently call it after
truncating the table anyway. That won't tell you how many pages were
set all-frozen by this vacuum, as pages could have been unfrozen by
DML that occurred after the page was vacuumed. It might be useful in
addition to the line about the visibility map.

This is somewhat in conflict with Robert and Peter's points about how
autovacuum logging should be about what this vacuum did. But, we do
have lines that talk about the before and after values:

new relfrozenxid: 748, which is 3 XIDs ahead of previous value

So, we could do something like:
visibility map before: %u pages all-visible, %u pages all-frozen
visibility map after: %u pages all-visible, %u pages all-frozen
or
visibility map after: %u pages all-visible (%u more than before), %u
pages all-frozen (%u more than before)

I still prefer adding how many pages were set all-frozen by this vacuum, though.

I also like the idea of showing how many pages were set all-frozen by this vacuum (which meets Robert's requirement for figuring out if a vacuum operation did anything useful). The values for pages marked all visible and all frozen can fluctuate for a number of reasons, even, as you point out, from concurrent activity during the vacuum. This is different from relfrozenxid which is a kind of high water mark. So I think the output styles can reasonably be different.

visibility map: %u pages all-visible (%u marked by this operation), %u pages all-frozen (%u marked by this operation)

seems to support everyone's requirements

Re: Count and log pages set all-frozen by vacuum

From
Masahiko Sawada
Date:
On Fri, Nov 1, 2024 at 5:12 AM Alastair Turner <minion@decodable.me> wrote:
>
> On Thu, 31 Oct 2024 at 18:51, Melanie Plageman <melanieplageman@gmail.com> wrote:
>>
>>
>> Would it also be useful to have the number set all-visible? It seems
>> like if we added a new line prefixed with visibility map, it ought to
>> include all-visible and all-frozen then.
>> Something like this:
>>  visibility map: %u pages set all-visible, %u pages set all-frozen.
>>
>> I find it more confusing to say "up to X may have been removed from
>> the table." It's unclear what that means -- especially since we
>> already have "pages removed" in another part of the log message.
>
>
> Yeah, on looking at it again, that does seem to make things worse.
>
>> We actually could call visibilitymap_count() at the beginning of the
>> vacuum and then log the difference between that and the results of
>> calling it after finishing the vacuum. We currently call it after
>> truncating the table anyway. That won't tell you how many pages were
>> set all-frozen by this vacuum, as pages could have been unfrozen by
>> DML that occurred after the page was vacuumed. It might be useful in
>> addition to the line about the visibility map.
>>
>> This is somewhat in conflict with Robert and Peter's points about how
>> autovacuum logging should be about what this vacuum did. But, we do
>> have lines that talk about the before and after values:
>>
>> new relfrozenxid: 748, which is 3 XIDs ahead of previous value
>>
>> So, we could do something like:
>> visibility map before: %u pages all-visible, %u pages all-frozen
>> visibility map after: %u pages all-visible, %u pages all-frozen
>> or
>> visibility map after: %u pages all-visible (%u more than before), %u
>> pages all-frozen (%u more than before)
>>
>> I still prefer adding how many pages were set all-frozen by this vacuum, though.
>>
> I also like the idea of showing how many pages were set all-frozen by this vacuum (which meets Robert's requirement
forfiguring out if a vacuum operation did anything useful). 

+1

> visibility map: %u pages all-visible (%u marked by this operation), %u pages all-frozen (%u marked by this operation)

Having "marked by this operation" twice seems to be redundant. How
about something like the output below?

visibility map: %u pages set all-visible (%u pages total), %u pages
set all-frozen (%u pages total)

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Count and log pages set all-frozen by vacuum

From
Robert Haas
Date:
On Fri, Nov 1, 2024 at 12:23 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Having "marked by this operation" twice seems to be redundant. How
> about something like the output below?
>
> visibility map: %u pages set all-visible (%u pages total), %u pages
> set all-frozen (%u pages total)

For me, the meaning of that isn't clear.

I think this is the wrong direction, anyway. If someone says "hey, we
should add X to the output" and someone else says "we should add Y
instead," it doesn't follow that the right thing to do is to add both.
I happen to think that the right answer is X, both because X of my
understanding of the purpose of this log message, and also because X
is in the service of Melanie's larger goal and Y is not. But I also
feel like bikeshedding the patch that somebody should have written
instead of reviewing the one they actually wrote is to be avoided. Of
course, sometimes there's no getting around the fact that the person
chose to do something that didn't really make sense, and then it's
reasonable to suggest alternatives. But here, what was actually done
does make sense and is the first choice of some people. What is
proposed can be added now, provided that it actually gets some review,
and the other thing can be added later if someone wants to do the work
and if no problems are discovered, but it isn't Melanie's job to add
data that isn't needed for her project.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Count and log pages set all-frozen by vacuum

From
Masahiko Sawada
Date:
On Fri, Nov 1, 2024 at 9:41 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Nov 1, 2024 at 12:23 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > Having "marked by this operation" twice seems to be redundant. How
> > about something like the output below?
> >
> > visibility map: %u pages set all-visible (%u pages total), %u pages
> > set all-frozen (%u pages total)
>
> For me, the meaning of that isn't clear.
>
> I think this is the wrong direction, anyway. If someone says "hey, we
> should add X to the output" and someone else says "we should add Y
> instead," it doesn't follow that the right thing to do is to add both.
> I happen to think that the right answer is X, both because X of my
> understanding of the purpose of this log message, and also because X
> is in the service of Melanie's larger goal and Y is not. But I also
> feel like bikeshedding the patch that somebody should have written
> instead of reviewing the one they actually wrote is to be avoided. Of
> course, sometimes there's no getting around the fact that the person
> chose to do something that didn't really make sense, and then it's
> reasonable to suggest alternatives. But here, what was actually done
> does make sense and is the first choice of some people. What is
> proposed can be added now, provided that it actually gets some review,
> and the other thing can be added later if someone wants to do the work
> and if no problems are discovered, but it isn't Melanie's job to add
> data that isn't needed for her project.

Agreed.

I think we agreed with what the patches proposed by Melanie do, so
let's focus on these patches on this thread. We can add other
information later if we need.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Count and log pages set all-frozen by vacuum

From
Melanie Plageman
Date:
On Fri, Nov 1, 2024 at 5:39 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> I think we agreed with what the patches proposed by Melanie do, so
> let's focus on these patches on this thread. We can add other
> information later if we need.

Thanks for the feedback and input.
So, currently what I have is a line for updates to the visibility map:

visibility map: 5 pages set all-visible, 4 pages set all-frozen.

Because this patch set is a prerequisite for the work I proposed over
in [1], Andres happened to review this patch in the course of
reviewing the larger patch set. He brought up yet a different
perspective that I hadn't thought of [2]. He says:

> Hm. Why is it interesting to log VM changes separately from the state changes
> of underlying pages? Wouldn't it e.g. be more intersting to log the number of
> empty pages that vacuum [re-]discovered? I've a bit hard time seeing how a
> user could take advantage of this.

I think he is saying that the updates to the VM to set pages
all-frozen belong with the "frozen" line of vacuum log output:
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen

Personally, I do think pages set all-visible/all-frozen in the VM is
interesting to users -- when determining how much useful work a given
vacuum did.
And the "frozen" line is really about freezing tuples -- there can be
pages with newly frozen tuples that aren't set all-frozen in the VM
and pages set all-frozen in the VM that don't have newly frozen tuples
(because they are empty).

I do agree that logging the number of empty pages vacuum rediscovered
could be useful (maybe in a "freespace" prefixed line about freespace
and empty pages vacuum found). But, I don't think that is a reason not
to add VM updates to the vacuum log output.

So, after all of the discussion on this thread, I propose the existing
vacuum log output (as on master) with the addition of one line about
pages newly set all-visible/all-frozen by this vacuum:

visibility map: 5 pages set all-visible, 4 pages set all-frozen.

- Melanie

[1] https://www.postgresql.org/message-id/flat/CAAKRu_ZF_KCzZuOrPrOqjGVe8iRVWEAJSpzMgRQs%3D5-v84cXUg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/ctdjzroezaxmiyah3gwbwm67defsrwj2b5fpfs4ku6msfpxeia%40mwjyqlhwr2wu