Thread: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

[HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Claudio Freire
Date:
As discussed in the vacuum ring buffer thread:

Make Vacuum update the FSM more frequently, to avoid the case where
autovacuum fails to reach the point where it updates the FSM in
highly contended tables.

Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids
recursing into branches that already contain enough free space, to
avoid having to traverse the whole FSM and thus induce quadratic
costs. Intermediate FSM vacuums are only supposed to make enough
free space visible to avoid extension until the final (non-partial)
FSM vacuum.

Run partial FSM vacuums after each index pass, which is a point
at which whole ranges of the heap have been thorougly cleaned, and
we can expect no further updates to those ranges of the FSM save
for concurrent activity. When there are no indexes, and thus no
index passes, run partial FSM vacuums every 8GB of dirtied pages
or 1/8th of the relation, whichever is highest. This allows some
partial work to be made visible without incurring quadratic cost.

In any case, FSM are small in relation to the table, so even when
quadratic cost is involved, it should not be problematic. Index
passes already incur quadratic cost, and the addition of the FSM
is unlikely to be measurable.

Run a partial FSM vacuum with a low pruning threshold right at
the beginning of the VACUUM to finish up any work left over from
prior canceled vacuum runs, something that is common in highly
contended tables when running only autovacuum.

Autovacuum canceling is thus handled by updating the FSM first-thing
when autovacuum retries vacuuming a relation. I attempted to add
an autovacuum work item for performing the FSM update shortly
after the cancel, but that had some issues so I abandoned the approach.

For one, it would sometimes crash when adding the work item from
inside autovacuum itself. I didn't find the cause of the crash, but I suspect
AutoVacuumRequestWork was being called in a context where it
was not safe. In any case, the way work items work didn't seem like
it would have worked for our purposes anyway, since they will only ever
be processed after processing all tables in the database, something that
could take ages, the work items are limited to 256, which would make
the approach troublesome for databases with more than 256 tables that
trigger this case, and it required de-duplicating work items which had
quadratic cost without major refactoring of the feature.

So, patch attached. I'll add it to the next CF as well.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Jing Wang
Date:
A few general comments. + FreeSpaceMapVacuum(onerel, 64); Just want to know why '64' is used here? It's better to give a description. + else + { + newslot = fsm_get_avail(page, 0); + } Since there is only one line in the else the bracket will not be needed. And there in one more space ahead of else which should be removed. + if (nindexes == 0 && + (vacuumed_pages_at_fsm_vac - vacuumed_pages) > vacuum_fsm_every_pages) + { + /* Vacuum the Free Space Map */ + FreeSpaceMapVacuum(onerel, 0); + vacuumed_pages_at_fsm_vac = vacuumed_pages; + } vacuumed_pages_at_fsm_vac is initialised with value zero and seems no chance to go into the bracket. Regards, Jing Wang Fujitsu Australia

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Michael Paquier
Date:
On Mon, Nov 27, 2017 at 2:39 PM, Jing Wang <jingwangian@gmail.com> wrote:
> A few general comments.
>
> +   FreeSpaceMapVacuum(onerel, 64);
>
> Just want to know why '64' is used here? It's better to give a description.
>
> +    else
> +   {
> +       newslot = fsm_get_avail(page, 0);
> +   }
>
> Since there is only one line in the else the bracket will not be needed. And
> there in one more space ahead of else which should be removed.
>
>
> +       if (nindexes == 0 &&
> +           (vacuumed_pages_at_fsm_vac - vacuumed_pages) >
> vacuum_fsm_every_pages)
> +       {
> +           /* Vacuum the Free Space Map */
> +           FreeSpaceMapVacuum(onerel, 0);
> +           vacuumed_pages_at_fsm_vac = vacuumed_pages;
> +       }
>
> vacuumed_pages_at_fsm_vac is initialised with value zero and seems no chance
> to go into the bracket.

The patch presented still applies, and there has been a review two
days ago. This is a short delay to answer for the author, so I am
moving this patch to next CF with the same status of waiting on
author.
-- 
Michael


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Stephen Frost
Date:
Greetings Claudio,

* Michael Paquier (michael.paquier@gmail.com) wrote:
> On Mon, Nov 27, 2017 at 2:39 PM, Jing Wang <jingwangian@gmail.com> wrote:
> > A few general comments.
> >
> > +   FreeSpaceMapVacuum(onerel, 64);
> >
> > Just want to know why '64' is used here? It's better to give a description.
> >
> > +    else
> > +   {
> > +       newslot = fsm_get_avail(page, 0);
> > +   }
> >
> > Since there is only one line in the else the bracket will not be needed. And
> > there in one more space ahead of else which should be removed.
> >
> >
> > +       if (nindexes == 0 &&
> > +           (vacuumed_pages_at_fsm_vac - vacuumed_pages) >
> > vacuum_fsm_every_pages)
> > +       {
> > +           /* Vacuum the Free Space Map */
> > +           FreeSpaceMapVacuum(onerel, 0);
> > +           vacuumed_pages_at_fsm_vac = vacuumed_pages;
> > +       }
> >
> > vacuumed_pages_at_fsm_vac is initialised with value zero and seems no chance
> > to go into the bracket.
>
> The patch presented still applies, and there has been a review two
> days ago. This is a short delay to answer for the author, so I am
> moving this patch to next CF with the same status of waiting on
> author.

Looks like this patch probably still applies and the changes suggested
above seem pretty small, any chance you can provide an updated patch
soon Claudio, so that it can be reviewed properly during this
commitfest?

Thanks!

Stephen

Attachment

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Claudio Freire
Date:


On Sat, Jan 6, 2018 at 7:33 PM, Stephen Frost <sfrost@snowman.net> wrote:
Greetings Claudio,

* Michael Paquier (michael.paquier@gmail.com) wrote:
> On Mon, Nov 27, 2017 at 2:39 PM, Jing Wang <jingwangian@gmail.com> wrote:
> > A few general comments.
> >
> > +   FreeSpaceMapVacuum(onerel, 64);
> >
> > Just want to know why '64' is used here? It's better to give a description.
> >
> > +    else
> > +   {
> > +       newslot = fsm_get_avail(page, 0);
> > +   }
> >
> > Since there is only one line in the else the bracket will not be needed. And
> > there in one more space ahead of else which should be removed.
> >
> >
> > +       if (nindexes == 0 &&
> > +           (vacuumed_pages_at_fsm_vac - vacuumed_pages) >
> > vacuum_fsm_every_pages)
> > +       {
> > +           /* Vacuum the Free Space Map */
> > +           FreeSpaceMapVacuum(onerel, 0);
> > +           vacuumed_pages_at_fsm_vac = vacuumed_pages;
> > +       }
> >
> > vacuumed_pages_at_fsm_vac is initialised with value zero and seems no chance
> > to go into the bracket.
>
> The patch presented still applies, and there has been a review two
> days ago. This is a short delay to answer for the author, so I am
> moving this patch to next CF with the same status of waiting on
> author.

Looks like this patch probably still applies and the changes suggested
above seem pretty small, any chance you can provide an updated patch
soon Claudio, so that it can be reviewed properly during this
commitfest?

I failed to notice the comments among the list's noise, sorry.

I'll get on to it now.

On Mon, Nov 27, 2017 at 2:39 AM, Jing Wang <jingwangian@gmail.com> wrote:
A few general comments.

+   FreeSpaceMapVacuum(onerel, 64);

Just want to know why '64' is used here? It's better to give a description. 

It's just a random cutoff point.

The point of that vacuum run is to mark free space early, to avoid needless relation extension before long vacuum runs finish. This only happens if the FSM is mostly zeroes, if there's free space in there, it will be used.

To make this run fast, since it's all redundant work before the final FSM vacuum pass, branches with more than that amount of free space are skipped. There's little point in vacuuming those early since they already contain free space. This helps avoid the quadratic cost of vacuuming the FSM every N dead tuples, since already-vacuumed branches will have free space, and will thus be skipped. It could still be quadratic in the worst case, but it avoids it often enough.

As for the value itself, it's an arbitrary choice. In-between index passes, we have a rather precise cutoff we can use to avoid this redundant work. But in the first run, we don't, so I made an arbitrary choice there that felt right. I have no empirical performance data about alternative values.
 

+    else
+   {
+       newslot = fsm_get_avail(page, 0);
+   }

Since there is only one line in the else the bracket will not be needed. And there in one more space ahead of else which should be removed.

Ok
 

+       if (nindexes == 0 &&
+           (vacuumed_pages_at_fsm_vac - vacuumed_pages) > vacuum_fsm_every_pages)
+       {
+           /* Vacuum the Free Space Map */
+           FreeSpaceMapVacuum(onerel, 0);
+           vacuumed_pages_at_fsm_vac = vacuumed_pages;
+       }

vacuumed_pages_at_fsm_vac is initialised with value zero and seems no chance to go into the bracket.

You're right, the difference there is backwards

Attached patch with the proposed changes.


Attachment

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Masahiko Sawada
Date:
On Sat, Jul 29, 2017 at 9:42 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
> Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids
> recursing into branches that already contain enough free space, to
> avoid having to traverse the whole FSM and thus induce quadratic
> costs. Intermediate FSM vacuums are only supposed to make enough
> free space visible to avoid extension until the final (non-partial)
> FSM vacuum.

Hmm, I think this resolve a  part of the issue. How about calling
AutoVacuumRequestWork() in PG_CATCH() if VACOPT_VACUUM is specified
and give the relid that we were vacuuming but could not complete as a
new autovacuum work-item? The new autovacuum work-item makes the
worker vacuum FSMs of the given relation and its indices. That way, we
can ensure that FSM gets vacuumed by the cancelled autovacuum process
or other autovacuum processes. Since a work-item can be handled by
other autovacuum process I think 256 work-item limit would not be a
problem. Or it might be better to make autovacuum launcher launch
worker process if there is pending work-item in a database even if
there is no table needs to be vacuumed/analyzed.

> For one, it would sometimes crash when adding the work item from
> inside autovacuum itself. I didn't find the cause of the crash, but I suspect
> AutoVacuumRequestWork was being called in a context where it
> was not safe.

Perhaps the cause of this might be that autovacuum work-item is
implemented using DSA, which has been changed before by commit
31ae1638ce35c23979f9bcbb92c6bb51744dbccb.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Claudio Freire
Date:
On Mon, Jan 29, 2018 at 4:12 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Sat, Jul 29, 2017 at 9:42 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids
>> recursing into branches that already contain enough free space, to
>> avoid having to traverse the whole FSM and thus induce quadratic
>> costs. Intermediate FSM vacuums are only supposed to make enough
>> free space visible to avoid extension until the final (non-partial)
>> FSM vacuum.
>
> Hmm, I think this resolve a  part of the issue. How about calling
> AutoVacuumRequestWork() in PG_CATCH() if VACOPT_VACUUM is specified
> and give the relid that we were vacuuming but could not complete as a
> new autovacuum work-item? The new autovacuum work-item makes the
> worker vacuum FSMs of the given relation and its indices.

Well, I tried that in fact, as I mentioned in the OP.

I abandoned it due to the conjunction of the 2 main blockers I found
and mentioned there. In essence, those issues defeat the purpose of
the patch (to get the free space visible ASAP).

Don't forget, this is aimed at cases where autovacuum of a single
relation takes a very long time. That is, very big relations. Maybe
days, like in my case. A whole autovacuum cycle can take weeks, so
delaying FSM vacuum that much is not good, and using work items still
cause those delays, not to mention the segfaults.

> That way, we
> can ensure that FSM gets vacuumed by the cancelled autovacuum process
> or other autovacuum processes. Since a work-item can be handled by
> other autovacuum process I think 256 work-item limit would not be a
> problem.

Why do you think it wouldn't? In particular if you take into account
the above. If you have more than 256 relations in the cluster, it
could very well happen that you've queued the maximum amount and no
autovac worker has had a chance to take a look at them, because
they're all stuck vacuuming huge relations.

Not to mention the need to de-duplicate work items. We wouldn't want
to request repeated FSM vacuums, or worst, queue an FSM vacuum of a
single table 256 times and fill up the queue with redundant items.
With the current structure, de-duplication is O(N), so if we wanted to
raise the limit of 256 work items, we'd need a structure that would
let us de-duplicate in less than O(N). In essence, it's a ton of work
for very little gain. Hence why I abandoned it.

> Or it might be better to make autovacuum launcher launch
> worker process if there is pending work-item in a database even if
> there is no table needs to be vacuumed/analyzed.

Quite a must, in fact.

>> For one, it would sometimes crash when adding the work item from
>> inside autovacuum itself. I didn't find the cause of the crash, but I suspect
>> AutoVacuumRequestWork was being called in a context where it
>> was not safe.
>
> Perhaps the cause of this might be that autovacuum work-item is
> implemented using DSA, which has been changed before by commit
> 31ae1638ce35c23979f9bcbb92c6bb51744dbccb.

I thought about that. Perhaps. But the work-item approach didn't fail
just because of the segfaults. It's also that it doesn't address the
free space visibility issue quickly enough.


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Masahiko Sawada
Date:
On Mon, Jan 29, 2018 at 11:31 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Mon, Jan 29, 2018 at 4:12 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Sat, Jul 29, 2017 at 9:42 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
>>> Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids
>>> recursing into branches that already contain enough free space, to
>>> avoid having to traverse the whole FSM and thus induce quadratic
>>> costs. Intermediate FSM vacuums are only supposed to make enough
>>> free space visible to avoid extension until the final (non-partial)
>>> FSM vacuum.
>>
>> Hmm, I think this resolve a  part of the issue. How about calling
>> AutoVacuumRequestWork() in PG_CATCH() if VACOPT_VACUUM is specified
>> and give the relid that we were vacuuming but could not complete as a
>> new autovacuum work-item? The new autovacuum work-item makes the
>> worker vacuum FSMs of the given relation and its indices.
>
> Well, I tried that in fact, as I mentioned in the OP.
>
> I abandoned it due to the conjunction of the 2 main blockers I found
> and mentioned there. In essence, those issues defeat the purpose of
> the patch (to get the free space visible ASAP).
>
> Don't forget, this is aimed at cases where autovacuum of a single
> relation takes a very long time. That is, very big relations. Maybe
> days, like in my case. A whole autovacuum cycle can take weeks, so
> delaying FSM vacuum that much is not good, and using work items still
> cause those delays, not to mention the segfaults.

Yeah, I agree to vacuum fsm more frequently because it can prevent
table bloating from concurrent modifying. But considering the way to
prevent the table bloating from cancellation of autovacuum, I guess we
need more things. This proposal seems to provide us an ability that is
we "might be" able to prevent table bloating due to cancellation of
autovacuum. Since we can know that autovacuum is cancelled, I'd like
to have a way so that we can surely vacuum fsm even if vacuum is
cancelled. Thoughts?

Also the patch always vacuums fsm at beginning of the vacuum with a
threshold but it is useless if the table has been properly vacuumed. I
don't think it's good idea to add an additional step that "might be"
efficient, because current vacuum is already heavy.

>
>> That way, we
>> can ensure that FSM gets vacuumed by the cancelled autovacuum process
>> or other autovacuum processes. Since a work-item can be handled by
>> other autovacuum process I think 256 work-item limit would not be a
>> problem.
>
> Why do you think it wouldn't? In particular if you take into account
> the above. If you have more than 256 relations in the cluster, it
> could very well happen that you've queued the maximum amount and no
> autovac worker has had a chance to take a look at them, because
> they're all stuck vacuuming huge relations.
>
> Not to mention the need to de-duplicate work items. We wouldn't want
> to request repeated FSM vacuums, or worst, queue an FSM vacuum of a
> single table 256 times and fill up the queue with redundant items.
> With the current structure, de-duplication is O(N), so if we wanted to
> raise the limit of 256 work items, we'd need a structure that would
> let us de-duplicate in less than O(N). In essence, it's a ton of work
> for very little gain. Hence why I abandoned it.

I'd missed something. Agreed with you.

On detail of the patch,

--- a/src/backend/storage/freespace/indexfsm.c
+++ b/src/backend/storage/freespace/indexfsm.c
@@ -70,5 +70,5 @@ RecordUsedIndexPage(Relation rel, BlockNumber usedBlock)
 void
 IndexFreeSpaceMapVacuum(Relation rel)
 {
-  FreeSpaceMapVacuum(rel);
+  FreeSpaceMapVacuum(rel, 0);
 }

@@ -816,11 +820,19 @@ fsm_vacuum_page(Relation rel, FSMAddress addr,
bool *eof_p)
      {
         int         child_avail;

+        /* Tree pruning for partial vacuums */
+        if (threshold)
+        {
+           child_avail = fsm_get_avail(page, slot);
+           if (child_avail >= threshold)
+              continue;
+        }

Don't we skip all fsm pages if we set the threshold to 0?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Claudio Freire
Date:
On Thu, Feb 1, 2018 at 9:34 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Mon, Jan 29, 2018 at 11:31 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> On Mon, Jan 29, 2018 at 4:12 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> On Sat, Jul 29, 2017 at 9:42 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
>>>> Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids
>>>> recursing into branches that already contain enough free space, to
>>>> avoid having to traverse the whole FSM and thus induce quadratic
>>>> costs. Intermediate FSM vacuums are only supposed to make enough
>>>> free space visible to avoid extension until the final (non-partial)
>>>> FSM vacuum.
>>>
>>> Hmm, I think this resolve a  part of the issue. How about calling
>>> AutoVacuumRequestWork() in PG_CATCH() if VACOPT_VACUUM is specified
>>> and give the relid that we were vacuuming but could not complete as a
>>> new autovacuum work-item? The new autovacuum work-item makes the
>>> worker vacuum FSMs of the given relation and its indices.
>>
>> Well, I tried that in fact, as I mentioned in the OP.
>>
>> I abandoned it due to the conjunction of the 2 main blockers I found
>> and mentioned there. In essence, those issues defeat the purpose of
>> the patch (to get the free space visible ASAP).
>>
>> Don't forget, this is aimed at cases where autovacuum of a single
>> relation takes a very long time. That is, very big relations. Maybe
>> days, like in my case. A whole autovacuum cycle can take weeks, so
>> delaying FSM vacuum that much is not good, and using work items still
>> cause those delays, not to mention the segfaults.
>
> Yeah, I agree to vacuum fsm more frequently because it can prevent
> table bloating from concurrent modifying. But considering the way to
> prevent the table bloating from cancellation of autovacuum, I guess we
> need more things. This proposal seems to provide us an ability that is
> we "might be" able to prevent table bloating due to cancellation of
> autovacuum. Since we can know that autovacuum is cancelled, I'd like
> to have a way so that we can surely vacuum fsm even if vacuum is
> cancelled. Thoughts?

After autovacuum gets cancelled, the next time it wakes up it will
retry vacuuming the cancelled relation. That's because a cancelled
autovacuum doesn't update the last-vacuumed stats.

So the timing between an autovacuum work item and the next retry for
that relation is more or less an autovacuum nap time, except perhaps
in the case where many vacuums get cancelled, and they have to be
queued.

That's why an initial FSM vacuum makes sense. It has a similar timing
to the autovacuum work item, it has the benefit that it can be
triggered manually (manual vacuum), and it's cheap and efficient.

> Also the patch always vacuums fsm at beginning of the vacuum with a
> threshold but it is useless if the table has been properly vacuumed. I
> don't think it's good idea to add an additional step that "might be"
> efficient, because current vacuum is already heavy.

FSMs are several orders of magnitude smaller than the tables
themselves. A TB-sized table I have here has a 95M FSM. If you add
threshold skipping, that initial FSM vacuum *will* be efficient.

By definition, the FSM will be less than 1/4000th of the table, so
that initial FSM pass takes less than 1/4000th of the whole vacuum.
Considerably less considering the simplicity of the task.

> On detail of the patch,
>
> --- a/src/backend/storage/freespace/indexfsm.c
> +++ b/src/backend/storage/freespace/indexfsm.c
> @@ -70,5 +70,5 @@ RecordUsedIndexPage(Relation rel, BlockNumber usedBlock)
>  void
>  IndexFreeSpaceMapVacuum(Relation rel)
>  {
> -  FreeSpaceMapVacuum(rel);
> +  FreeSpaceMapVacuum(rel, 0);
>  }
>
> @@ -816,11 +820,19 @@ fsm_vacuum_page(Relation rel, FSMAddress addr,
> bool *eof_p)
>       {
>          int         child_avail;
>
> +        /* Tree pruning for partial vacuums */
> +        if (threshold)
> +        {
> +           child_avail = fsm_get_avail(page, slot);
> +           if (child_avail >= threshold)
> +              continue;
> +        }
>
> Don't we skip all fsm pages if we set the threshold to 0?

No, that doesn't skip anything if threshold is 0, since it doesn't get
to the continue, which is the one skipping nodes.


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Masahiko Sawada
Date:
On Fri, Feb 2, 2018 at 11:13 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Thu, Feb 1, 2018 at 9:34 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Mon, Jan 29, 2018 at 11:31 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>>> On Mon, Jan 29, 2018 at 4:12 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>> On Sat, Jul 29, 2017 at 9:42 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
>>>>> Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids
>>>>> recursing into branches that already contain enough free space, to
>>>>> avoid having to traverse the whole FSM and thus induce quadratic
>>>>> costs. Intermediate FSM vacuums are only supposed to make enough
>>>>> free space visible to avoid extension until the final (non-partial)
>>>>> FSM vacuum.
>>>>
>>>> Hmm, I think this resolve a  part of the issue. How about calling
>>>> AutoVacuumRequestWork() in PG_CATCH() if VACOPT_VACUUM is specified
>>>> and give the relid that we were vacuuming but could not complete as a
>>>> new autovacuum work-item? The new autovacuum work-item makes the
>>>> worker vacuum FSMs of the given relation and its indices.
>>>
>>> Well, I tried that in fact, as I mentioned in the OP.
>>>
>>> I abandoned it due to the conjunction of the 2 main blockers I found
>>> and mentioned there. In essence, those issues defeat the purpose of
>>> the patch (to get the free space visible ASAP).
>>>
>>> Don't forget, this is aimed at cases where autovacuum of a single
>>> relation takes a very long time. That is, very big relations. Maybe
>>> days, like in my case. A whole autovacuum cycle can take weeks, so
>>> delaying FSM vacuum that much is not good, and using work items still
>>> cause those delays, not to mention the segfaults.
>>
>> Yeah, I agree to vacuum fsm more frequently because it can prevent
>> table bloating from concurrent modifying. But considering the way to
>> prevent the table bloating from cancellation of autovacuum, I guess we
>> need more things. This proposal seems to provide us an ability that is
>> we "might be" able to prevent table bloating due to cancellation of
>> autovacuum. Since we can know that autovacuum is cancelled, I'd like
>> to have a way so that we can surely vacuum fsm even if vacuum is
>> cancelled. Thoughts?
>
> After autovacuum gets cancelled, the next time it wakes up it will
> retry vacuuming the cancelled relation. That's because a cancelled
> autovacuum doesn't update the last-vacuumed stats.
>
> So the timing between an autovacuum work item and the next retry for
> that relation is more or less an autovacuum nap time, except perhaps
> in the case where many vacuums get cancelled, and they have to be
> queued.

I think that's not true if there are multiple databases.

>
> That's why an initial FSM vacuum makes sense. It has a similar timing
> to the autovacuum work item, it has the benefit that it can be
> triggered manually (manual vacuum), and it's cheap and efficient.
>
>> Also the patch always vacuums fsm at beginning of the vacuum with a
>> threshold but it is useless if the table has been properly vacuumed. I
>> don't think it's good idea to add an additional step that "might be"
>> efficient, because current vacuum is already heavy.
>
> FSMs are several orders of magnitude smaller than the tables
> themselves. A TB-sized table I have here has a 95M FSM. If you add
> threshold skipping, that initial FSM vacuum *will* be efficient.
>
> By definition, the FSM will be less than 1/4000th of the table, so
> that initial FSM pass takes less than 1/4000th of the whole vacuum.
> Considerably less considering the simplicity of the task.

I agree the fsm is very smaller than heap and vacuum on fsm will not
be comparatively heavy but I'm afraid that the vacuum will get more
heavy in the future if we pile up such improvement that are small but
might not be efficient. For example, a feature for reporting the last
vacuum status has been proposed[1]. I wonder if we can use it to
determine whether we do the fsm vacuum at beginning of vacuum.

>> On detail of the patch,
>>
>> --- a/src/backend/storage/freespace/indexfsm.c
>> +++ b/src/backend/storage/freespace/indexfsm.c
>> @@ -70,5 +70,5 @@ RecordUsedIndexPage(Relation rel, BlockNumber usedBlock)
>>  void
>>  IndexFreeSpaceMapVacuum(Relation rel)
>>  {
>> -  FreeSpaceMapVacuum(rel);
>> +  FreeSpaceMapVacuum(rel, 0);
>>  }
>>
>> @@ -816,11 +820,19 @@ fsm_vacuum_page(Relation rel, FSMAddress addr,
>> bool *eof_p)
>>       {
>>          int         child_avail;
>>
>> +        /* Tree pruning for partial vacuums */
>> +        if (threshold)
>> +        {
>> +           child_avail = fsm_get_avail(page, slot);
>> +           if (child_avail >= threshold)
>> +              continue;
>> +        }
>>
>> Don't we skip all fsm pages if we set the threshold to 0?
>
> No, that doesn't skip anything if threshold is 0, since it doesn't get
> to the continue, which is the one skipping nodes.

Oops, yes you're right. Sorry for the noise.

[1] https://www.postgresql.org/message-id/20171010.192616.108347483.horiguchi.kyotaro%40lab.ntt.co.jp

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Claudio Freire
Date:
On Mon, Feb 5, 2018 at 1:53 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Fri, Feb 2, 2018 at 11:13 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> After autovacuum gets cancelled, the next time it wakes up it will
>> retry vacuuming the cancelled relation. That's because a cancelled
>> autovacuum doesn't update the last-vacuumed stats.
>>
>> So the timing between an autovacuum work item and the next retry for
>> that relation is more or less an autovacuum nap time, except perhaps
>> in the case where many vacuums get cancelled, and they have to be
>> queued.
>
> I think that's not true if there are multiple databases.

I'd have to re-check.

The loop is basically, IIRC:

while(1) { vacuum db ; work items ; nap }

Now, if that's vacuum one db, not all, and if the decision on the
second run doesn't pick the same db because that big table failed to
be vacuumed, then you're right.

In that case we could add the FSM vacuum as a work item *in addition*
to what this patch does. If the work item queue is full and the FSM
vacuum doesn't happen, it'd be no worse than with the patch as-is.

Is that what you suggest?

With that in mind, I'm noticing WorkItems have a avw_database that
isn't checked by do_autovacuum. Is that right? Shouldn't only work
items that belong to the database being autovacuumed be processed?

>> That's why an initial FSM vacuum makes sense. It has a similar timing
>> to the autovacuum work item, it has the benefit that it can be
>> triggered manually (manual vacuum), and it's cheap and efficient.
>>
>>> Also the patch always vacuums fsm at beginning of the vacuum with a
>>> threshold but it is useless if the table has been properly vacuumed. I
>>> don't think it's good idea to add an additional step that "might be"
>>> efficient, because current vacuum is already heavy.
>>
>> FSMs are several orders of magnitude smaller than the tables
>> themselves. A TB-sized table I have here has a 95M FSM. If you add
>> threshold skipping, that initial FSM vacuum *will* be efficient.
>>
>> By definition, the FSM will be less than 1/4000th of the table, so
>> that initial FSM pass takes less than 1/4000th of the whole vacuum.
>> Considerably less considering the simplicity of the task.
>
> I agree the fsm is very smaller than heap and vacuum on fsm will not
> be comparatively heavy but I'm afraid that the vacuum will get more
> heavy in the future if we pile up such improvement that are small but
> might not be efficient. For example, a feature for reporting the last
> vacuum status has been proposed[1]. I wonder if we can use it to
> determine whether we do the fsm vacuum at beginning of vacuum.

Yes, such a feature would allow skipping that initial FSM vacuum. That
can be improved in a future patch if that proposed feature gets
merged. This patch can be treated independently from that in the
meanwhile, don't you think?


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Claudio Freire
Date:
On Mon, Feb 5, 2018 at 2:55 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> With that in mind, I'm noticing WorkItems have a avw_database that
> isn't checked by do_autovacuum. Is that right? Shouldn't only work
> items that belong to the database being autovacuumed be processed?

NVM. I had to git pull, it's fixed in head.


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Masahiko Sawada
Date:
On Tue, Feb 6, 2018 at 2:55 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Mon, Feb 5, 2018 at 1:53 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Fri, Feb 2, 2018 at 11:13 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>>> After autovacuum gets cancelled, the next time it wakes up it will
>>> retry vacuuming the cancelled relation. That's because a cancelled
>>> autovacuum doesn't update the last-vacuumed stats.
>>>
>>> So the timing between an autovacuum work item and the next retry for
>>> that relation is more or less an autovacuum nap time, except perhaps
>>> in the case where many vacuums get cancelled, and they have to be
>>> queued.
>>
>> I think that's not true if there are multiple databases.
>
> I'd have to re-check.
>
> The loop is basically, IIRC:
>
> while(1) { vacuum db ; work items ; nap }
>
> Now, if that's vacuum one db, not all, and if the decision on the
> second run doesn't pick the same db because that big table failed to
> be vacuumed, then you're right.
>
> In that case we could add the FSM vacuum as a work item *in addition*
> to what this patch does. If the work item queue is full and the FSM
> vacuum doesn't happen, it'd be no worse than with the patch as-is.
>
> Is that what you suggest?

That's one of the my suggestion. I might had to consider this issue
for each case. To be clear let me summarize for each case.

For table with indices, vacuum on fsm of heap is done after
lazy_vacuum_heap(). Therefore I think that the case where a table got
vacuumed but fsm couldn't get vacuumed doesn't happen unless the
autovacuum gets cancelled before or while vacuuming fsm. So it can
solve the problem in most cases. Also we can use a vacuum progress
information to check whether we should vacuum fsm of table after got
cancelled. For vacuuming fsm of index, we might have to consider to
vacuum fsm of index after lazy_vacuum_index.

For table with no index, it would be more complicated; similar to
table with indices we can vacuum fsm of table more frequently but
table bloating still can happen. Considering a way to surely vacuum
fsm of table there are some approaches:
(1) using autovacuum work-item, (2) vacuuming fsm of table in
PG_CATCH, (3) remembering the tables got cancelled and vacuuming them
after finished a loop of table_oids.

For (1), we have a issue that is work-item queue will be full when
many tables get cancelled and it's not good idea to queue many
redundant work-items. For (2), it would not be a good way to vacuum
fsm of table immediately after cancelled because immediately after got
cancelled the table still likely to being locked by others. For (3),
that might work fine but it can happen that other autovacum worker
vacuums the fsm of table before processing the list. But it would be
better than we always vacuums them at beginning of vacuum. So I'm in
favor of (3). Even when processing tables in the list, we should take
a lock on the table conditionally so that the autovacuum doesn't block
any foreground work. However, it's quite possible that I'm not seeing
the whole picture here.

>
> With that in mind, I'm noticing WorkItems have a avw_database that
> isn't checked by do_autovacuum. Is that right? Shouldn't only work
> items that belong to the database being autovacuumed be processed?
>>> That's why an initial FSM vacuum makes sense. It has a similar timing
>>> to the autovacuum work item, it has the benefit that it can be
>>> triggered manually (manual vacuum), and it's cheap and efficient.
>>>
>>>> Also the patch always vacuums fsm at beginning of the vacuum with a
>>>> threshold but it is useless if the table has been properly vacuumed. I
>>>> don't think it's good idea to add an additional step that "might be"
>>>> efficient, because current vacuum is already heavy.
>>>
>>> FSMs are several orders of magnitude smaller than the tables
>>> themselves. A TB-sized table I have here has a 95M FSM. If you add
>>> threshold skipping, that initial FSM vacuum *will* be efficient.
>>>
>>> By definition, the FSM will be less than 1/4000th of the table, so
>>> that initial FSM pass takes less than 1/4000th of the whole vacuum.
>>> Considerably less considering the simplicity of the task.
>>
>> I agree the fsm is very smaller than heap and vacuum on fsm will not
>> be comparatively heavy but I'm afraid that the vacuum will get more
>> heavy in the future if we pile up such improvement that are small but
>> might not be efficient. For example, a feature for reporting the last
>> vacuum status has been proposed[1]. I wonder if we can use it to
>> determine whether we do the fsm vacuum at beginning of vacuum.
>
> Yes, such a feature would allow skipping that initial FSM vacuum. That
> can be improved in a future patch if that proposed feature gets
> merged. This patch can be treated independently from that in the
> meanwhile, don't you think?

IMO, I'd like to have it after we could have such an optimization.
Otherwise it will result in adding extra steps for every vacuums until
we get the optimization. But, as you said, we also can think it's no
problem because the vacuum fsm of table at beginning of the vacuum has
the threshold. I'd like to hear more opinions about this.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Claudio Freire
Date:
On Tue, Feb 6, 2018 at 4:56 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Tue, Feb 6, 2018 at 2:55 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> On Mon, Feb 5, 2018 at 1:53 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> On Fri, Feb 2, 2018 at 11:13 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>>>> After autovacuum gets cancelled, the next time it wakes up it will
>>>> retry vacuuming the cancelled relation. That's because a cancelled
>>>> autovacuum doesn't update the last-vacuumed stats.
>>>>
>>>> So the timing between an autovacuum work item and the next retry for
>>>> that relation is more or less an autovacuum nap time, except perhaps
>>>> in the case where many vacuums get cancelled, and they have to be
>>>> queued.
>>>
>>> I think that's not true if there are multiple databases.
>>
>> I'd have to re-check.
>>
>> The loop is basically, IIRC:
>>
>> while(1) { vacuum db ; work items ; nap }
>>
>> Now, if that's vacuum one db, not all, and if the decision on the
>> second run doesn't pick the same db because that big table failed to
>> be vacuumed, then you're right.
>>
>> In that case we could add the FSM vacuum as a work item *in addition*
>> to what this patch does. If the work item queue is full and the FSM
>> vacuum doesn't happen, it'd be no worse than with the patch as-is.
>>
>> Is that what you suggest?
>
> That's one of the my suggestion. I might had to consider this issue
> for each case. To be clear let me summarize for each case.
>
> For table with indices, vacuum on fsm of heap is done after
> lazy_vacuum_heap(). Therefore I think that the case where a table got
> vacuumed but fsm couldn't get vacuumed doesn't happen unless the
> autovacuum gets cancelled before or while vacuuming fsm.

Well, that's the whole point. Autovacuums get cancelled all the time
in highly contended tables. I have more than a handful tables in
production that never finish autovacuum, so I have to trigger manual
vacuums periodically.

> (1) using autovacuum work-item, (2) vacuuming fsm of table in
> PG_CATCH, (3) remembering the tables got cancelled and vacuuming them
> after finished a loop of table_oids.
...
> So I'm in favor of (3).
...
> However, it's quite possible that I'm not seeing the whole picture here.

Well, none of that handles the case where the autovacuum of a table
doesn't get cancelled, but takes a very long time.

No free space becomes visible during long-running vacuums. That means
bloat keeps accumulating even though vacuum is freeing space, because
the FSM doesn't expose that free space.

The extra work incurred in those FSM vacuums isn't useless, it's
preventing bloat in long-running vacuums.

I can look into doing 3, that *might* get rid of the need to do that
initial FSM vacuum, but all other intermediate vacuums are still
needed.


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Robert Haas
Date:
On Tue, Feb 6, 2018 at 7:51 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
> No free space becomes visible during long-running vacuums. That means
> bloat keeps accumulating even though vacuum is freeing space, because
> the FSM doesn't expose that free space.
>
> The extra work incurred in those FSM vacuums isn't useless, it's
> preventing bloat in long-running vacuums.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Masahiko Sawada
Date:
On Tue, Feb 6, 2018 at 9:51 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Tue, Feb 6, 2018 at 4:56 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Tue, Feb 6, 2018 at 2:55 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
>>> On Mon, Feb 5, 2018 at 1:53 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>> On Fri, Feb 2, 2018 at 11:13 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>>>>> After autovacuum gets cancelled, the next time it wakes up it will
>>>>> retry vacuuming the cancelled relation. That's because a cancelled
>>>>> autovacuum doesn't update the last-vacuumed stats.
>>>>>
>>>>> So the timing between an autovacuum work item and the next retry for
>>>>> that relation is more or less an autovacuum nap time, except perhaps
>>>>> in the case where many vacuums get cancelled, and they have to be
>>>>> queued.
>>>>
>>>> I think that's not true if there are multiple databases.
>>>
>>> I'd have to re-check.
>>>
>>> The loop is basically, IIRC:
>>>
>>> while(1) { vacuum db ; work items ; nap }
>>>
>>> Now, if that's vacuum one db, not all, and if the decision on the
>>> second run doesn't pick the same db because that big table failed to
>>> be vacuumed, then you're right.
>>>
>>> In that case we could add the FSM vacuum as a work item *in addition*
>>> to what this patch does. If the work item queue is full and the FSM
>>> vacuum doesn't happen, it'd be no worse than with the patch as-is.
>>>
>>> Is that what you suggest?
>>
>> That's one of the my suggestion. I might had to consider this issue
>> for each case. To be clear let me summarize for each case.
>>
>> For table with indices, vacuum on fsm of heap is done after
>> lazy_vacuum_heap(). Therefore I think that the case where a table got
>> vacuumed but fsm couldn't get vacuumed doesn't happen unless the
>> autovacuum gets cancelled before or while vacuuming fsm.
>
> Well, that's the whole point. Autovacuums get cancelled all the time
> in highly contended tables. I have more than a handful tables in
> production that never finish autovacuum, so I have to trigger manual
> vacuums periodically.
>
>> (1) using autovacuum work-item, (2) vacuuming fsm of table in
>> PG_CATCH, (3) remembering the tables got cancelled and vacuuming them
>> after finished a loop of table_oids.
> ...
>> So I'm in favor of (3).
> ...
>> However, it's quite possible that I'm not seeing the whole picture here.
>
> Well, none of that handles the case where the autovacuum of a table
> doesn't get cancelled, but takes a very long time.
>
> No free space becomes visible during long-running vacuums. That means
> bloat keeps accumulating even though vacuum is freeing space, because
> the FSM doesn't expose that free space.
>
> The extra work incurred in those FSM vacuums isn't useless, it's
> preventing bloat in long-running vacuums.
>
> I can look into doing 3, that *might* get rid of the need to do that
> initial FSM vacuum, but all other intermediate vacuums are still
> needed.

Understood. So how about that this patch focuses only make FSM vacuum
more frequently and leaves the initial FSM vacuum and the handling
cancellation cases? The rest can be a separate patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Claudio Freire
Date:
On Thu, Feb 8, 2018 at 1:36 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Tue, Feb 6, 2018 at 9:51 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> I can look into doing 3, that *might* get rid of the need to do that
>> initial FSM vacuum, but all other intermediate vacuums are still
>> needed.
>
> Understood. So how about that this patch focuses only make FSM vacuum
> more frequently and leaves the initial FSM vacuum and the handling
> cancellation cases? The rest can be a separate patch.

Ok.

Attached split patches. All but the initial FSM pass is in 1, the
initial FSM pass as in the original patch is in 2.

I will post an alternative patch for 2 when/if I have one. In the
meanwhile, we can talk about 1.

Attachment

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Masahiko Sawada
Date:
On Fri, Feb 9, 2018 at 12:45 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Thu, Feb 8, 2018 at 1:36 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Tue, Feb 6, 2018 at 9:51 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>>> I can look into doing 3, that *might* get rid of the need to do that
>>> initial FSM vacuum, but all other intermediate vacuums are still
>>> needed.
>>
>> Understood. So how about that this patch focuses only make FSM vacuum
>> more frequently and leaves the initial FSM vacuum and the handling
>> cancellation cases? The rest can be a separate patch.
>
> Ok.
>
> Attached split patches. All but the initial FSM pass is in 1, the
> initial FSM pass as in the original patch is in 2.
>
> I will post an alternative patch for 2 when/if I have one. In the
> meanwhile, we can talk about 1.

Thank you for updating the patch!

+            /* Tree pruning for partial vacuums */
+            if (threshold)
+            {
+                child_avail = fsm_get_avail(page, slot);
+                if (child_avail >= threshold)
+                    continue;
+            }

I think slots in a non-leaf page might not have a correct value
because fsm_set_avail doesn't propagate up beyond pages. So do we need
to check the root of child page rather than a slot in the non-lead
page? I might be missing something though.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Claudio Freire
Date:
On Fri, Feb 9, 2018 at 1:36 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Fri, Feb 9, 2018 at 12:45 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> On Thu, Feb 8, 2018 at 1:36 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> On Tue, Feb 6, 2018 at 9:51 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>>>> I can look into doing 3, that *might* get rid of the need to do that
>>>> initial FSM vacuum, but all other intermediate vacuums are still
>>>> needed.
>>>
>>> Understood. So how about that this patch focuses only make FSM vacuum
>>> more frequently and leaves the initial FSM vacuum and the handling
>>> cancellation cases? The rest can be a separate patch.
>>
>> Ok.
>>
>> Attached split patches. All but the initial FSM pass is in 1, the
>> initial FSM pass as in the original patch is in 2.
>>
>> I will post an alternative patch for 2 when/if I have one. In the
>> meanwhile, we can talk about 1.
>
> Thank you for updating the patch!
>
> +            /* Tree pruning for partial vacuums */
> +            if (threshold)
> +            {
> +                child_avail = fsm_get_avail(page, slot);
> +                if (child_avail >= threshold)
> +                    continue;
> +            }
>
> I think slots in a non-leaf page might not have a correct value
> because fsm_set_avail doesn't propagate up beyond pages. So do we need
> to check the root of child page rather than a slot in the non-lead
> page? I might be missing something though.

I'm not sure I understand what you're asking.

Yes, a partial FSM vacuum won't correct those inconsistencies. It
doesn't matter though, because as long as the root node (or whichever
inner node we're looking at the time) reports free space, the lookup
for free space will recurse and find the lower nodes, even if they
report more space than the inner node reported. The goal of making
free space discoverable will have been achieved nonetheless.

The final FSM vacuum pass isn't partial, to finally correct all those
small inconsistencies.


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Masahiko Sawada
Date:
On Fri, Feb 9, 2018 at 11:48 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Fri, Feb 9, 2018 at 1:36 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Fri, Feb 9, 2018 at 12:45 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
>>> On Thu, Feb 8, 2018 at 1:36 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>> On Tue, Feb 6, 2018 at 9:51 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>>>>> I can look into doing 3, that *might* get rid of the need to do that
>>>>> initial FSM vacuum, but all other intermediate vacuums are still
>>>>> needed.
>>>>
>>>> Understood. So how about that this patch focuses only make FSM vacuum
>>>> more frequently and leaves the initial FSM vacuum and the handling
>>>> cancellation cases? The rest can be a separate patch.
>>>
>>> Ok.
>>>
>>> Attached split patches. All but the initial FSM pass is in 1, the
>>> initial FSM pass as in the original patch is in 2.
>>>
>>> I will post an alternative patch for 2 when/if I have one. In the
>>> meanwhile, we can talk about 1.
>>
>> Thank you for updating the patch!
>>
>> +            /* Tree pruning for partial vacuums */
>> +            if (threshold)
>> +            {
>> +                child_avail = fsm_get_avail(page, slot);
>> +                if (child_avail >= threshold)
>> +                    continue;
>> +            }
>>
>> I think slots in a non-leaf page might not have a correct value
>> because fsm_set_avail doesn't propagate up beyond pages. So do we need
>> to check the root of child page rather than a slot in the non-lead
>> page? I might be missing something though.
>
> I'm not sure I understand what you're asking.
>
> Yes, a partial FSM vacuum won't correct those inconsistencies. It
> doesn't matter though, because as long as the root node (or whichever
> inner node we're looking at the time) reports free space, the lookup
> for free space will recurse and find the lower nodes, even if they
> report more space than the inner node reported. The goal of making
> free space discoverable will have been achieved nonetheless.

For example, if a slot of internal node of fsm tree has a value
greater than the threshold while the root of leaf node of fsm tree has
a value less than threshold, we want to update the internal node of
fsm tree but we will skip it with your patch. I wonder if this can
happen.

> The final FSM vacuum pass isn't partial, to finally correct all those
> small inconsistencies.

Yes, but the purpose of this patch is to prevent table bloating from
concurrent modifications?

Here is other review comments.

+            /* Tree pruning for partial vacuums */
+            if (threshold)
+            {
+                child_avail = fsm_get_avail(page, slot);
+                if (child_avail >= threshold)
+                    continue;
+            }

'child_avail' is a category value while 'threshold' is an actual size
of freespace.

The logic of finding the max_freespace seems not work fine if table
with indices because max_freespace is not updated based on
post-compaction freespace. I think we need to get the max freespace
size during vacuuming heap (i.g. at lazy_vacuum_heap) and use it as
the threshold.

+       vacuum_fsm_every_pages = nblocks / 8;
+       vacuum_fsm_every_pages = Max(vacuum_fsm_every_pages, 1048576);

I think a comment for 1048576 is needed. And perhaps we can define it
as a macro.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Claudio Freire
Date:
On Wed, Feb 14, 2018 at 3:59 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Fri, Feb 9, 2018 at 11:48 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> On Fri, Feb 9, 2018 at 1:36 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> On Fri, Feb 9, 2018 at 12:45 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
>>>> On Thu, Feb 8, 2018 at 1:36 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>>> On Tue, Feb 6, 2018 at 9:51 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>>>>>> I can look into doing 3, that *might* get rid of the need to do that
>>>>>> initial FSM vacuum, but all other intermediate vacuums are still
>>>>>> needed.
>>>>>
>>>>> Understood. So how about that this patch focuses only make FSM vacuum
>>>>> more frequently and leaves the initial FSM vacuum and the handling
>>>>> cancellation cases? The rest can be a separate patch.
>>>>
>>>> Ok.
>>>>
>>>> Attached split patches. All but the initial FSM pass is in 1, the
>>>> initial FSM pass as in the original patch is in 2.
>>>>
>>>> I will post an alternative patch for 2 when/if I have one. In the
>>>> meanwhile, we can talk about 1.
>>>
>>> Thank you for updating the patch!
>>>
>>> +            /* Tree pruning for partial vacuums */
>>> +            if (threshold)
>>> +            {
>>> +                child_avail = fsm_get_avail(page, slot);
>>> +                if (child_avail >= threshold)
>>> +                    continue;
>>> +            }
>>>
>>> I think slots in a non-leaf page might not have a correct value
>>> because fsm_set_avail doesn't propagate up beyond pages. So do we need
>>> to check the root of child page rather than a slot in the non-lead
>>> page? I might be missing something though.
>>
>> I'm not sure I understand what you're asking.
>>
>> Yes, a partial FSM vacuum won't correct those inconsistencies. It
>> doesn't matter though, because as long as the root node (or whichever
>> inner node we're looking at the time) reports free space, the lookup
>> for free space will recurse and find the lower nodes, even if they
>> report more space than the inner node reported. The goal of making
>> free space discoverable will have been achieved nonetheless.
>
> For example, if a slot of internal node of fsm tree has a value
> greater than the threshold while the root of leaf node of fsm tree has
> a value less than threshold, we want to update the internal node of
> fsm tree but we will skip it with your patch. I wonder if this can
> happen.

If I understand your point correctly, that would mean that space was
used since the last vacuum. Partial FSM vacuums don't concern
themselves with that case, the normal free space search algorithm will
handle that, retrying with the next slot until it succeeds (or until
it gives up).

IIUC, each time a failure of such kind is found while searching, the
FSM gets updated to avoid following that link a second time.

See, in fsm_search:

/*
 * At lower level, failure can happen if the value in the upper-
 * level node didn't reflect the value on the lower page. Update
 * the upper node, to avoid falling into the same trap again, and
 * start over.


>
>> The final FSM vacuum pass isn't partial, to finally correct all those
>> small inconsistencies.
>
> Yes, but the purpose of this patch is to prevent table bloating from
> concurrent modifications?

Yes, by *marking* unmarked space. If the slot is above the threshold,
it means there's already marked free space on that branch, and search
will go into it already, so no pressing need to refine the mark. The
space already marked can serve while vacuum makes further progress.
Ie: it can wait.

> Here is other review comments.
>
> +            /* Tree pruning for partial vacuums */
> +            if (threshold)
> +            {
> +                child_avail = fsm_get_avail(page, slot);
> +                if (child_avail >= threshold)
> +                    continue;
> +            }
>
> 'child_avail' is a category value while 'threshold' is an actual size
> of freespace.

Good catch. It was meant to be a category, but I see the public
interface of the FSM doesn't usually expose categories, so fixing
that.

> The logic of finding the max_freespace seems not work fine if table
> with indices because max_freespace is not updated based on
> post-compaction freespace. I think we need to get the max freespace
> size during vacuuming heap (i.g. at lazy_vacuum_heap) and use it as
> the threshold.

Good point.

>
> +       vacuum_fsm_every_pages = nblocks / 8;
> +       vacuum_fsm_every_pages = Max(vacuum_fsm_every_pages, 1048576);
>
> I think a comment for 1048576 is needed. And perhaps we can define it
> as a macro.

1M pages = 8GB with an 8kb page size.

I can clarify.

Attached are new versions of the patches with these corrections.

Attachment

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Claudio Freire
Date:
On Thu, Feb 15, 2018 at 4:47 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Wed, Feb 14, 2018 at 3:59 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

>>> The final FSM vacuum pass isn't partial, to finally correct all those
>>> small inconsistencies.
>>
>> Yes, but the purpose of this patch is to prevent table bloating from
>> concurrent modifications?
>
> Yes, by *marking* unmarked space. If the slot is above the threshold,
> it means there's already marked free space on that branch, and search
> will go into it already, so no pressing need to refine the mark. The
> space already marked can serve while vacuum makes further progress.
> Ie: it can wait.

Although... I think I see what you mean.

If you have this:

255
.   0
.   .   0
.   .   255
.   0
.   .   64
.   .   64
.   0
.   .   0
.   .   64


A partial vacuum won't even recurse beyond the root node, so it won't
expose the free space 2 levels down.

This could be arrived at by having an almost fully packed table that
contains some free space near the end, and it gets deletions near the
beginning. Vacuum will mark the leaf levels at the beginning of the
table, and we'd like for that free space to be discoverable to avoid
packing more rows near the end where they prevent truncation.

The patch as it is now will only vacuum that part of the FSM when the
root value drops, which usually requires all the free space on that
region of the heap to be exhausted.

With the current recursive FSM vacuum method, however, that's
unavoidable. We can't detect that there's some free space below the
current level to be exposed without recursing into the tree, and
recursing is exactly the kind of work we want to prevent with tree
pruning.

In all my trials, however, I did not run into that case. It must not
be easy to get into that situation, because the root node already has
~4000 slots in it. Each of those 4000 slots should be in that
situation to keep the space at the end of the table as the sole
discoverable free space, which seems like a rather contorted worst
case.


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Masahiko Sawada
Date:
On Fri, Feb 16, 2018 at 5:00 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Thu, Feb 15, 2018 at 4:47 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> On Wed, Feb 14, 2018 at 3:59 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
>>>> The final FSM vacuum pass isn't partial, to finally correct all those
>>>> small inconsistencies.
>>>
>>> Yes, but the purpose of this patch is to prevent table bloating from
>>> concurrent modifications?
>>
>> Yes, by *marking* unmarked space. If the slot is above the threshold,
>> it means there's already marked free space on that branch, and search
>> will go into it already, so no pressing need to refine the mark. The
>> space already marked can serve while vacuum makes further progress.
>> Ie: it can wait.
>
> Although... I think I see what you mean.
>
> If you have this:
>
> 255
> .   0
> .   .   0
> .   .   255
> .   0
> .   .   64
> .   .   64
> .   0
> .   .   0
> .   .   64
>
>
> A partial vacuum won't even recurse beyond the root node, so it won't
> expose the free space 2 levels down.

Yeah, that's what I meant. I think this might be able to happen
slightly easily if a tables has fillfactor < 100. For example,
updating tuples on pages that are almost packed except for fillfactor
with the more bigger value

>
> This could be arrived at by having an almost fully packed table that
> contains some free space near the end, and it gets deletions near the
> beginning. Vacuum will mark the leaf levels at the beginning of the
> table, and we'd like for that free space to be discoverable to avoid
> packing more rows near the end where they prevent truncation.
>
> The patch as it is now will only vacuum that part of the FSM when the
> root value drops, which usually requires all the free space on that
> region of the heap to be exhausted.
>
> With the current recursive FSM vacuum method, however, that's
> unavoidable. We can't detect that there's some free space below the
> current level to be exposed without recursing into the tree, and
> recursing is exactly the kind of work we want to prevent with tree
> pruning.
>
> In all my trials, however, I did not run into that case. It must not
> be easy to get into that situation, because the root node already has
> ~4000 slots in it. Each of those 4000 slots should be in that
> situation to keep the space at the end of the table as the sole
> discoverable free space, which seems like a rather contorted worst
> case.

Okay, I guess that this patch cannot help in the case where the
freespace of pages shown on fsm gets decreased by vacuum because the
upper-level node doesn't reflect the value on the lower page. However
it doesn't happen often as you said and it's the same as it is even
though it happens. So I also think it would not become a problem.

I'll review v4 patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Masahiko Sawada
Date:
On Tue, Feb 20, 2018 at 5:04 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Fri, Feb 16, 2018 at 5:00 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> On Thu, Feb 15, 2018 at 4:47 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>>> On Wed, Feb 14, 2018 at 3:59 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>>>>> The final FSM vacuum pass isn't partial, to finally correct all those
>>>>> small inconsistencies.
>>>>
>>>> Yes, but the purpose of this patch is to prevent table bloating from
>>>> concurrent modifications?
>>>
>>> Yes, by *marking* unmarked space. If the slot is above the threshold,
>>> it means there's already marked free space on that branch, and search
>>> will go into it already, so no pressing need to refine the mark. The
>>> space already marked can serve while vacuum makes further progress.
>>> Ie: it can wait.
>>
>> Although... I think I see what you mean.
>>
>> If you have this:
>>
>> 255
>> .   0
>> .   .   0
>> .   .   255
>> .   0
>> .   .   64
>> .   .   64
>> .   0
>> .   .   0
>> .   .   64
>>
>>
>> A partial vacuum won't even recurse beyond the root node, so it won't
>> expose the free space 2 levels down.
>
> Yeah, that's what I meant. I think this might be able to happen
> slightly easily if a tables has fillfactor < 100. For example,
> updating tuples on pages that are almost packed except for fillfactor
> with the more bigger value
>
>>
>> This could be arrived at by having an almost fully packed table that
>> contains some free space near the end, and it gets deletions near the
>> beginning. Vacuum will mark the leaf levels at the beginning of the
>> table, and we'd like for that free space to be discoverable to avoid
>> packing more rows near the end where they prevent truncation.
>>
>> The patch as it is now will only vacuum that part of the FSM when the
>> root value drops, which usually requires all the free space on that
>> region of the heap to be exhausted.
>>
>> With the current recursive FSM vacuum method, however, that's
>> unavoidable. We can't detect that there's some free space below the
>> current level to be exposed without recursing into the tree, and
>> recursing is exactly the kind of work we want to prevent with tree
>> pruning.
>>
>> In all my trials, however, I did not run into that case. It must not
>> be easy to get into that situation, because the root node already has
>> ~4000 slots in it. Each of those 4000 slots should be in that
>> situation to keep the space at the end of the table as the sole
>> discoverable free space, which seems like a rather contorted worst
>> case.
>
> Okay, I guess that this patch cannot help in the case where the
> freespace of pages shown on fsm gets decreased by vacuum because the
> upper-level node doesn't reflect the value on the lower page. However
> it doesn't happen often as you said and it's the same as it is even
> though it happens. So I also think it would not become a problem.
>
> I'll review v4 patch.
>

Here is review comment for v4 patch.

@@ -1922,6 +1988,8 @@ count_nondeletable_pages(Relation onerel,
LVRelStats *vacrelstats)
                 * We don't insert a vacuum delay point here, because we have an
                 * exclusive lock on the table which we want to hold
for as short a
                 * time as possible.  We still need to check for
interrupts however.
+                * We might have to acquire the autovacuum lock,
however, but that
+                * shouldn't pose a deadlock risk.
                 */
                CHECK_FOR_INTERRUPTS();

Is this change necessary?

----
+               /*
+                * If there are no indexes then we should periodically
vacuum the FSM
+                * on huge relations to make free space visible early.
+                */
+               if (nindexes == 0 &&
+                       (vacuumed_pages - vacuumed_pages_at_fsm_vac) >
vacuum_fsm_every_pages)
+               {
+                       /* Vacuum the Free Space Map */
+                       FreeSpaceMapVacuum(onerel, max_freespace);
+                       vacuumed_pages_at_fsm_vac = vacuumed_pages;
+                       max_freespace = 0;
+               }

I think this code block should be executed before we check if the page
is whether new or empty and then do 'continue'. Otherwise we cannot
reach this code if the table has a lot of new or empty pages.

----
@@ -785,7 +789,7 @@ fsm_search(Relation rel, uint8 min_cat)
  * Recursive guts of FreeSpaceMapVacuum
  */
 static uint8
-fsm_vacuum_page(Relation rel, FSMAddress addr, bool *eof_p)
+fsm_vacuum_page(Relation rel, FSMAddress addr, uint8 threshold, bool *eof_p)
 {
        Buffer          buf;
        Page            page;

I think the comment for 'threshold' is needed. Maybe for
FreeSpaceMapvacuum as well?

----
@@ -663,6 +663,8 @@ fsm_extend(Relation rel, BlockNumber fsm_nblocks)
  * If minValue > 0, the updated page is also searched for a page with at
  * least minValue of free space. If one is found, its slot number is
  * returned, -1 otherwise.
+ *
+ * If minValue == 0, the value at the root node is returned.
  */
 static int
 fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot,
@@ -687,6 +689,8 @@ fsm_set_and_search(Relation rel, FSMAddress addr,
uint16 slot,

addr.level == FSM_BOTTOM_LEVEL,
                                                                   true);
        }
+       else
+               newslot = fsm_get_avail(page, 0);

I think it's not good idea to make fsm_set_and_search return either a
slot number or a category value according to the argument. Category
values is actually uint8 but this function returns it as int. Also we
can call fsm_set_and_search with minValue = 0 at
RecordAndGetPageWithFreeSpace() when search_cat is 0. With you patch,
fsm_set_and_search() then returns the root value but it's not correct.

Also, is this change necessary for this patch? ISTM this change is
used for the change in fsm_update_recursive() as follows but I guess
this change can be a separate patch.

+       new_cat = fsm_set_and_search(rel, parent, parentslot, new_cat, 0);
+
+       /*
+        * Bubble up, not the value we just set, but the one now in the root
+        * node of the just-updated page, which is the page's highest value.
+        */

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Claudio Freire
Date:
On Mon, Feb 26, 2018 at 6:00 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Here is review comment for v4 patch.
>
> @@ -1922,6 +1988,8 @@ count_nondeletable_pages(Relation onerel,
> LVRelStats *vacrelstats)
>                  * We don't insert a vacuum delay point here, because we have an
>                  * exclusive lock on the table which we want to hold
> for as short a
>                  * time as possible.  We still need to check for
> interrupts however.
> +                * We might have to acquire the autovacuum lock,
> however, but that
> +                * shouldn't pose a deadlock risk.
>                  */
>                 CHECK_FOR_INTERRUPTS();
>
> Is this change necessary?

I don't recall doing that change. Maybe a rebase gone wrong.

> ----
> +               /*
> +                * If there are no indexes then we should periodically
> vacuum the FSM
> +                * on huge relations to make free space visible early.
> +                */
> +               if (nindexes == 0 &&
> +                       (vacuumed_pages - vacuumed_pages_at_fsm_vac) >
> vacuum_fsm_every_pages)
> +               {
> +                       /* Vacuum the Free Space Map */
> +                       FreeSpaceMapVacuum(onerel, max_freespace);
> +                       vacuumed_pages_at_fsm_vac = vacuumed_pages;
> +                       max_freespace = 0;
> +               }
>
> I think this code block should be executed before we check if the page
> is whether new or empty and then do 'continue'. Otherwise we cannot
> reach this code if the table has a lot of new or empty pages.

In order for the counter (vacuumed_pages) to increase, there have to
be plenty of opportunities for this code to run, and I specifically
wanted to avoid vacuuming the FSM too often for those cases
particularly (when Vacuum scans lots of pages but does no writes).

> ----
> @@ -663,6 +663,8 @@ fsm_extend(Relation rel, BlockNumber fsm_nblocks)
>   * If minValue > 0, the updated page is also searched for a page with at
>   * least minValue of free space. If one is found, its slot number is
>   * returned, -1 otherwise.
> + *
> + * If minValue == 0, the value at the root node is returned.
>   */
>  static int
>  fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot,
> @@ -687,6 +689,8 @@ fsm_set_and_search(Relation rel, FSMAddress addr,
> uint16 slot,
>
> addr.level == FSM_BOTTOM_LEVEL,
>                                                                    true);
>         }
> +       else
> +               newslot = fsm_get_avail(page, 0);
>
> I think it's not good idea to make fsm_set_and_search return either a
> slot number or a category value according to the argument. Category
> values is actually uint8 but this function returns it as int.

Should be fine, uint8 can be contained inside an int in all platforms.

> Also we
> can call fsm_set_and_search with minValue = 0 at
> RecordAndGetPageWithFreeSpace() when search_cat is 0. With you patch,
> fsm_set_and_search() then returns the root value but it's not correct.

I guess I can add another function to do that.

> Also, is this change necessary for this patch? ISTM this change is
> used for the change in fsm_update_recursive() as follows but I guess
> this change can be a separate patch.
>
> +       new_cat = fsm_set_and_search(rel, parent, parentslot, new_cat, 0);
> +
> +       /*
> +        * Bubble up, not the value we just set, but the one now in the root
> +        * node of the just-updated page, which is the page's highest value.
> +        */

I can try to separate them I guess.


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Claudio Freire
Date:
On Mon, Feb 26, 2018 at 11:31 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> ----
>> +               /*
>> +                * If there are no indexes then we should periodically
>> vacuum the FSM
>> +                * on huge relations to make free space visible early.
>> +                */
>> +               if (nindexes == 0 &&
>> +                       (vacuumed_pages - vacuumed_pages_at_fsm_vac) >
>> vacuum_fsm_every_pages)
>> +               {
>> +                       /* Vacuum the Free Space Map */
>> +                       FreeSpaceMapVacuum(onerel, max_freespace);
>> +                       vacuumed_pages_at_fsm_vac = vacuumed_pages;
>> +                       max_freespace = 0;
>> +               }
>>
>> I think this code block should be executed before we check if the page
>> is whether new or empty and then do 'continue'. Otherwise we cannot
>> reach this code if the table has a lot of new or empty pages.
>
> In order for the counter (vacuumed_pages) to increase, there have to
> be plenty of opportunities for this code to run, and I specifically
> wanted to avoid vacuuming the FSM too often for those cases
> particularly (when Vacuum scans lots of pages but does no writes).

Wait, I see what you mean. Entirely empty pages.

Ok, I can move it.


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Claudio Freire
Date:
On Mon, Feb 26, 2018 at 11:31 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Mon, Feb 26, 2018 at 6:00 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> Here is review comment for v4 patch.
>>
>> @@ -1922,6 +1988,8 @@ count_nondeletable_pages(Relation onerel,
>> LVRelStats *vacrelstats)
>>                  * We don't insert a vacuum delay point here, because we have an
>>                  * exclusive lock on the table which we want to hold
>> for as short a
>>                  * time as possible.  We still need to check for
>> interrupts however.
>> +                * We might have to acquire the autovacuum lock,
>> however, but that
>> +                * shouldn't pose a deadlock risk.
>>                  */
>>                 CHECK_FOR_INTERRUPTS();
>>
>> Is this change necessary?
>
> I don't recall doing that change. Maybe a rebase gone wrong.
>
>> ----
>> +               /*
>> +                * If there are no indexes then we should periodically
>> vacuum the FSM
>> +                * on huge relations to make free space visible early.
>> +                */
>> +               if (nindexes == 0 &&
>> +                       (vacuumed_pages - vacuumed_pages_at_fsm_vac) >
>> vacuum_fsm_every_pages)
>> +               {
>> +                       /* Vacuum the Free Space Map */
>> +                       FreeSpaceMapVacuum(onerel, max_freespace);
>> +                       vacuumed_pages_at_fsm_vac = vacuumed_pages;
>> +                       max_freespace = 0;
>> +               }
>>
>> I think this code block should be executed before we check if the page
>> is whether new or empty and then do 'continue'. Otherwise we cannot
>> reach this code if the table has a lot of new or empty pages.
>
> In order for the counter (vacuumed_pages) to increase, there have to
> be plenty of opportunities for this code to run, and I specifically
> wanted to avoid vacuuming the FSM too often for those cases
> particularly (when Vacuum scans lots of pages but does no writes).
>
>> ----
>> @@ -663,6 +663,8 @@ fsm_extend(Relation rel, BlockNumber fsm_nblocks)
>>   * If minValue > 0, the updated page is also searched for a page with at
>>   * least minValue of free space. If one is found, its slot number is
>>   * returned, -1 otherwise.
>> + *
>> + * If minValue == 0, the value at the root node is returned.
>>   */
>>  static int
>>  fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot,
>> @@ -687,6 +689,8 @@ fsm_set_and_search(Relation rel, FSMAddress addr,
>> uint16 slot,
>>
>> addr.level == FSM_BOTTOM_LEVEL,
>>                                                                    true);
>>         }
>> +       else
>> +               newslot = fsm_get_avail(page, 0);
>>
>> I think it's not good idea to make fsm_set_and_search return either a
>> slot number or a category value according to the argument. Category
>> values is actually uint8 but this function returns it as int.
>
> Should be fine, uint8 can be contained inside an int in all platforms.
>
>> Also we
>> can call fsm_set_and_search with minValue = 0 at
>> RecordAndGetPageWithFreeSpace() when search_cat is 0. With you patch,
>> fsm_set_and_search() then returns the root value but it's not correct.
>
> I guess I can add another function to do that.
>
>> Also, is this change necessary for this patch? ISTM this change is
>> used for the change in fsm_update_recursive() as follows but I guess
>> this change can be a separate patch.
>>
>> +       new_cat = fsm_set_and_search(rel, parent, parentslot, new_cat, 0);
>> +
>> +       /*
>> +        * Bubble up, not the value we just set, but the one now in the root
>> +        * node of the just-updated page, which is the page's highest value.
>> +        */
>
> I can try to separate them I guess.

Patch set with the requested changes attached.

2 didn't change AFAIK, it's just rebased on top of the new 1.

Attachment

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Claudio Freire
Date:
On Mon, Feb 26, 2018 at 1:32 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Mon, Feb 26, 2018 at 11:31 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> On Mon, Feb 26, 2018 at 6:00 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> Here is review comment for v4 patch.
>>>
>>> @@ -1922,6 +1988,8 @@ count_nondeletable_pages(Relation onerel,
>>> LVRelStats *vacrelstats)
>>>                  * We don't insert a vacuum delay point here, because we have an
>>>                  * exclusive lock on the table which we want to hold
>>> for as short a
>>>                  * time as possible.  We still need to check for
>>> interrupts however.
>>> +                * We might have to acquire the autovacuum lock,
>>> however, but that
>>> +                * shouldn't pose a deadlock risk.
>>>                  */
>>>                 CHECK_FOR_INTERRUPTS();
>>>
>>> Is this change necessary?
>>
>> I don't recall doing that change. Maybe a rebase gone wrong.
>>
>>> ----
>>> +               /*
>>> +                * If there are no indexes then we should periodically
>>> vacuum the FSM
>>> +                * on huge relations to make free space visible early.
>>> +                */
>>> +               if (nindexes == 0 &&
>>> +                       (vacuumed_pages - vacuumed_pages_at_fsm_vac) >
>>> vacuum_fsm_every_pages)
>>> +               {
>>> +                       /* Vacuum the Free Space Map */
>>> +                       FreeSpaceMapVacuum(onerel, max_freespace);
>>> +                       vacuumed_pages_at_fsm_vac = vacuumed_pages;
>>> +                       max_freespace = 0;
>>> +               }
>>>
>>> I think this code block should be executed before we check if the page
>>> is whether new or empty and then do 'continue'. Otherwise we cannot
>>> reach this code if the table has a lot of new or empty pages.
>>
>> In order for the counter (vacuumed_pages) to increase, there have to
>> be plenty of opportunities for this code to run, and I specifically
>> wanted to avoid vacuuming the FSM too often for those cases
>> particularly (when Vacuum scans lots of pages but does no writes).
>>
>>> ----
>>> @@ -663,6 +663,8 @@ fsm_extend(Relation rel, BlockNumber fsm_nblocks)
>>>   * If minValue > 0, the updated page is also searched for a page with at
>>>   * least minValue of free space. If one is found, its slot number is
>>>   * returned, -1 otherwise.
>>> + *
>>> + * If minValue == 0, the value at the root node is returned.
>>>   */
>>>  static int
>>>  fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot,
>>> @@ -687,6 +689,8 @@ fsm_set_and_search(Relation rel, FSMAddress addr,
>>> uint16 slot,
>>>
>>> addr.level == FSM_BOTTOM_LEVEL,
>>>                                                                    true);
>>>         }
>>> +       else
>>> +               newslot = fsm_get_avail(page, 0);
>>>
>>> I think it's not good idea to make fsm_set_and_search return either a
>>> slot number or a category value according to the argument. Category
>>> values is actually uint8 but this function returns it as int.
>>
>> Should be fine, uint8 can be contained inside an int in all platforms.
>>
>>> Also we
>>> can call fsm_set_and_search with minValue = 0 at
>>> RecordAndGetPageWithFreeSpace() when search_cat is 0. With you patch,
>>> fsm_set_and_search() then returns the root value but it's not correct.
>>
>> I guess I can add another function to do that.
>>
>>> Also, is this change necessary for this patch? ISTM this change is
>>> used for the change in fsm_update_recursive() as follows but I guess
>>> this change can be a separate patch.
>>>
>>> +       new_cat = fsm_set_and_search(rel, parent, parentslot, new_cat, 0);
>>> +
>>> +       /*
>>> +        * Bubble up, not the value we just set, but the one now in the root
>>> +        * node of the just-updated page, which is the page's highest value.
>>> +        */
>>
>> I can try to separate them I guess.
>
> Patch set with the requested changes attached.
>
> 2 didn't change AFAIK, it's just rebased on top of the new 1.

Oops. Forgot about the comment changes requested. Fixed those in v6, attached.

Attachment

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Masahiko Sawada
Date:
On Tue, Feb 27, 2018 at 1:44 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Mon, Feb 26, 2018 at 1:32 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> On Mon, Feb 26, 2018 at 11:31 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
>>> On Mon, Feb 26, 2018 at 6:00 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>> Here is review comment for v4 patch.
>>>>
>>>> @@ -1922,6 +1988,8 @@ count_nondeletable_pages(Relation onerel,
>>>> LVRelStats *vacrelstats)
>>>>                  * We don't insert a vacuum delay point here, because we have an
>>>>                  * exclusive lock on the table which we want to hold
>>>> for as short a
>>>>                  * time as possible.  We still need to check for
>>>> interrupts however.
>>>> +                * We might have to acquire the autovacuum lock,
>>>> however, but that
>>>> +                * shouldn't pose a deadlock risk.
>>>>                  */
>>>>                 CHECK_FOR_INTERRUPTS();
>>>>
>>>> Is this change necessary?
>>>
>>> I don't recall doing that change. Maybe a rebase gone wrong.
>>>
>>>> ----
>>>> +               /*
>>>> +                * If there are no indexes then we should periodically
>>>> vacuum the FSM
>>>> +                * on huge relations to make free space visible early.
>>>> +                */
>>>> +               if (nindexes == 0 &&
>>>> +                       (vacuumed_pages - vacuumed_pages_at_fsm_vac) >
>>>> vacuum_fsm_every_pages)
>>>> +               {
>>>> +                       /* Vacuum the Free Space Map */
>>>> +                       FreeSpaceMapVacuum(onerel, max_freespace);
>>>> +                       vacuumed_pages_at_fsm_vac = vacuumed_pages;
>>>> +                       max_freespace = 0;
>>>> +               }
>>>>
>>>> I think this code block should be executed before we check if the page
>>>> is whether new or empty and then do 'continue'. Otherwise we cannot
>>>> reach this code if the table has a lot of new or empty pages.
>>>
>>> In order for the counter (vacuumed_pages) to increase, there have to
>>> be plenty of opportunities for this code to run, and I specifically
>>> wanted to avoid vacuuming the FSM too often for those cases
>>> particularly (when Vacuum scans lots of pages but does no writes).
>>>
>>>> ----
>>>> @@ -663,6 +663,8 @@ fsm_extend(Relation rel, BlockNumber fsm_nblocks)
>>>>   * If minValue > 0, the updated page is also searched for a page with at
>>>>   * least minValue of free space. If one is found, its slot number is
>>>>   * returned, -1 otherwise.
>>>> + *
>>>> + * If minValue == 0, the value at the root node is returned.
>>>>   */
>>>>  static int
>>>>  fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot,
>>>> @@ -687,6 +689,8 @@ fsm_set_and_search(Relation rel, FSMAddress addr,
>>>> uint16 slot,
>>>>
>>>> addr.level == FSM_BOTTOM_LEVEL,
>>>>                                                                    true);
>>>>         }
>>>> +       else
>>>> +               newslot = fsm_get_avail(page, 0);
>>>>
>>>> I think it's not good idea to make fsm_set_and_search return either a
>>>> slot number or a category value according to the argument. Category
>>>> values is actually uint8 but this function returns it as int.
>>>
>>> Should be fine, uint8 can be contained inside an int in all platforms.
>>>
>>>> Also we
>>>> can call fsm_set_and_search with minValue = 0 at
>>>> RecordAndGetPageWithFreeSpace() when search_cat is 0. With you patch,
>>>> fsm_set_and_search() then returns the root value but it's not correct.
>>>
>>> I guess I can add another function to do that.
>>>
>>>> Also, is this change necessary for this patch? ISTM this change is
>>>> used for the change in fsm_update_recursive() as follows but I guess
>>>> this change can be a separate patch.
>>>>
>>>> +       new_cat = fsm_set_and_search(rel, parent, parentslot, new_cat, 0);
>>>> +
>>>> +       /*
>>>> +        * Bubble up, not the value we just set, but the one now in the root
>>>> +        * node of the just-updated page, which is the page's highest value.
>>>> +        */
>>>
>>> I can try to separate them I guess.
>>
>> Patch set with the requested changes attached.
>>
>> 2 didn't change AFAIK, it's just rebased on top of the new 1.
>
> Oops. Forgot about the comment changes requested. Fixed those in v6, attached.

Thank you for updating patches!

0001 patch looks good to me except for the following unnecessary empty lines.

+                * If there are no indexes then we should periodically
vacuum the FSM
+                * on huge relations to make free space visible early.
+                */
+               else if (nindexes == 0 && fsm_updated_pages >
vacuum_fsm_every_pages)
+               {
+                       /* Vacuum the Free Space Map */
+                       FreeSpaceMapVacuum(onerel, max_freespace);
+                       fsm_updated_pages = 0;
+                       max_freespace = 0;
+               }
+
+
+               /*

@@ -913,10 +967,14 @@ lazy_scan_heap(Relation onerel, int options,
LVRelStats *vacrelstats,

vmbuffer, InvalidTransactionId,

VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
                                END_CRIT_SECTION();
+
                        }

0002 patch looks good to me.

For 0003 patch, I think fsm_set() function name could lead misreading
because it actually not only sets the value but also returns the
value. fsm_set_and_get() might be better?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Claudio Freire
Date:
On Mon, Feb 26, 2018 at 10:20 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Thank you for updating patches!
>
> 0001 patch looks good to me except for the following unnecessary empty lines.
>
> +                * If there are no indexes then we should periodically
> vacuum the FSM
> +                * on huge relations to make free space visible early.
> +                */
> +               else if (nindexes == 0 && fsm_updated_pages >
> vacuum_fsm_every_pages)
> +               {
> +                       /* Vacuum the Free Space Map */
> +                       FreeSpaceMapVacuum(onerel, max_freespace);
> +                       fsm_updated_pages = 0;
> +                       max_freespace = 0;
> +               }
> +
> +
> +               /*
>
> @@ -913,10 +967,14 @@ lazy_scan_heap(Relation onerel, int options,
> LVRelStats *vacrelstats,
>
> vmbuffer, InvalidTransactionId,
>
> VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
>                                 END_CRIT_SECTION();
> +
>                         }
>
> 0002 patch looks good to me.
>
> For 0003 patch, I think fsm_set() function name could lead misreading
> because it actually not only sets the value but also returns the
> value. fsm_set_and_get() might be better?

Attached is the patchset modified as requested

Attachment

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Claudio Freire
Date:
On Tue, Feb 6, 2018 at 4:56 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> For vacuuming fsm of index, we might have to consider to
> vacuum fsm of index after lazy_vacuum_index.

I've been thinking about that, and I think you're right.

So here's a fourth patch that adds that to nbtree's bulkdelete implementation.
Seems to be the best place to add such a thing.

GIN and GIST don't delete pages until vacuumcleanup, so they can't do
the same, sadly.

Attachment

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Masahiko Sawada
Date:
On Wed, Feb 28, 2018 at 12:06 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Mon, Feb 26, 2018 at 10:20 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> Thank you for updating patches!
>>
>> 0001 patch looks good to me except for the following unnecessary empty lines.
>>
>> +                * If there are no indexes then we should periodically
>> vacuum the FSM
>> +                * on huge relations to make free space visible early.
>> +                */
>> +               else if (nindexes == 0 && fsm_updated_pages >
>> vacuum_fsm_every_pages)
>> +               {
>> +                       /* Vacuum the Free Space Map */
>> +                       FreeSpaceMapVacuum(onerel, max_freespace);
>> +                       fsm_updated_pages = 0;
>> +                       max_freespace = 0;
>> +               }
>> +
>> +
>> +               /*
>>
>> @@ -913,10 +967,14 @@ lazy_scan_heap(Relation onerel, int options,
>> LVRelStats *vacrelstats,
>>
>> vmbuffer, InvalidTransactionId,
>>
>> VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
>>                                 END_CRIT_SECTION();
>> +
>>                         }
>>
>> 0002 patch looks good to me.
>>
>> For 0003 patch, I think fsm_set() function name could lead misreading
>> because it actually not only sets the value but also returns the
>> value. fsm_set_and_get() might be better?
>
> Attached is the patchset modified as requested

Thank you for updating the patches!

+/*
+ * When a table has no indexes, vacuum the FSM at most every this
+ * many dirty pages. With a default page size of 8kb, this value
+ * basically means 8GB of dirtied pages.
+ */
+#define VACUUM_FSM_EVERY_PAGES 1048576

Is it better if we vacuum fsm every 8GB regardless of the block size?
Because an installation that uses >8GB block size is likely to have
the pages less than what an 8GB block size installation has, the
current patch might lead to delay fsm vacuum. What do you think? If
it's better, we can define it as follows.

#define VACUUM_FSM_EVERY_PAGES ((8 * 1024 * 1024) / BLCKSZ)

--
@@ -470,7 +484,9 @@ lazy_scan_heap(Relation onerel, int options,
LVRelStats *vacrelstats,
     TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid;
     TransactionId relminmxid = onerel->rd_rel->relminmxid;
     BlockNumber empty_pages,
-                vacuumed_pages;
+                vacuumed_pages,
+                fsm_updated_pages,
+                vacuum_fsm_every_pages;
     double        num_tuples,
                 tups_vacuumed,
                 nkeep,

Regarding fsm_updated_pages variable name, I think it's better to be
freespace_updated_pages because we actually counts the page updated
its freespace, not fsm.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Claudio Freire
Date:
On Fri, Mar 2, 2018 at 7:38 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Thank you for updating the patches!
>
> +/*
> + * When a table has no indexes, vacuum the FSM at most every this
> + * many dirty pages. With a default page size of 8kb, this value
> + * basically means 8GB of dirtied pages.
> + */
> +#define VACUUM_FSM_EVERY_PAGES 1048576
>
> Is it better if we vacuum fsm every 8GB regardless of the block size?
> Because an installation that uses >8GB block size is likely to have
> the pages less than what an 8GB block size installation has, the
> current patch might lead to delay fsm vacuum. What do you think? If
> it's better, we can define it as follows.
>
> #define VACUUM_FSM_EVERY_PAGES ((8 * 1024 * 1024) / BLCKSZ)

I honestly don't know. I've never tweaked page size, and know nobody
that did, so I have no experience with those installations.

Lets go with your proposal.

>
> --
> @@ -470,7 +484,9 @@ lazy_scan_heap(Relation onerel, int options,
> LVRelStats *vacrelstats,
>      TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid;
>      TransactionId relminmxid = onerel->rd_rel->relminmxid;
>      BlockNumber empty_pages,
> -                vacuumed_pages;
> +                vacuumed_pages,
> +                fsm_updated_pages,
> +                vacuum_fsm_every_pages;
>      double        num_tuples,
>                  tups_vacuumed,
>                  nkeep,
>
> Regarding fsm_updated_pages variable name, I think it's better to be
> freespace_updated_pages because we actually counts the page updated
> its freespace, not fsm.

Good point.

New version attached.

Attachment

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Claudio Freire
Date:
On Fri, Mar 2, 2018 at 10:47 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Fri, Mar 2, 2018 at 7:38 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> Thank you for updating the patches!
>>
>> +/*
>> + * When a table has no indexes, vacuum the FSM at most every this
>> + * many dirty pages. With a default page size of 8kb, this value
>> + * basically means 8GB of dirtied pages.
>> + */
>> +#define VACUUM_FSM_EVERY_PAGES 1048576
>>
>> Is it better if we vacuum fsm every 8GB regardless of the block size?
>> Because an installation that uses >8GB block size is likely to have
>> the pages less than what an 8GB block size installation has, the
>> current patch might lead to delay fsm vacuum. What do you think? If
>> it's better, we can define it as follows.
>>
>> #define VACUUM_FSM_EVERY_PAGES ((8 * 1024 * 1024) / BLCKSZ)
>
> I honestly don't know. I've never tweaked page size, and know nobody
> that did, so I have no experience with those installations.
>
> Lets go with your proposal.
>
>>
>> --
>> @@ -470,7 +484,9 @@ lazy_scan_heap(Relation onerel, int options,
>> LVRelStats *vacrelstats,
>>      TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid;
>>      TransactionId relminmxid = onerel->rd_rel->relminmxid;
>>      BlockNumber empty_pages,
>> -                vacuumed_pages;
>> +                vacuumed_pages,
>> +                fsm_updated_pages,
>> +                vacuum_fsm_every_pages;
>>      double        num_tuples,
>>                  tups_vacuumed,
>>                  nkeep,
>>
>> Regarding fsm_updated_pages variable name, I think it's better to be
>> freespace_updated_pages because we actually counts the page updated
>> its freespace, not fsm.
>
> Good point.
>
> New version attached.

Sorry, forgot to actually squash. Now the real new version is attached.

Attachment

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Masahiko Sawada
Date:
On Fri, Mar 2, 2018 at 10:50 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Fri, Mar 2, 2018 at 10:47 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> On Fri, Mar 2, 2018 at 7:38 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> Thank you for updating the patches!
>>>
>>> +/*
>>> + * When a table has no indexes, vacuum the FSM at most every this
>>> + * many dirty pages. With a default page size of 8kb, this value
>>> + * basically means 8GB of dirtied pages.
>>> + */
>>> +#define VACUUM_FSM_EVERY_PAGES 1048576
>>>
>>> Is it better if we vacuum fsm every 8GB regardless of the block size?
>>> Because an installation that uses >8GB block size is likely to have
>>> the pages less than what an 8GB block size installation has, the
>>> current patch might lead to delay fsm vacuum. What do you think? If
>>> it's better, we can define it as follows.
>>>
>>> #define VACUUM_FSM_EVERY_PAGES ((8 * 1024 * 1024) / BLCKSZ)
>>
>> I honestly don't know. I've never tweaked page size, and know nobody
>> that did, so I have no experience with those installations.
>>
>> Lets go with your proposal.
>>
>>>
>>> --
>>> @@ -470,7 +484,9 @@ lazy_scan_heap(Relation onerel, int options,
>>> LVRelStats *vacrelstats,
>>>      TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid;
>>>      TransactionId relminmxid = onerel->rd_rel->relminmxid;
>>>      BlockNumber empty_pages,
>>> -                vacuumed_pages;
>>> +                vacuumed_pages,
>>> +                fsm_updated_pages,
>>> +                vacuum_fsm_every_pages;
>>>      double        num_tuples,
>>>                  tups_vacuumed,
>>>                  nkeep,
>>>
>>> Regarding fsm_updated_pages variable name, I think it's better to be
>>> freespace_updated_pages because we actually counts the page updated
>>> its freespace, not fsm.
>>
>> Good point.
>>
>> New version attached.
>
> Sorry, forgot to actually squash. Now the real new version is attached.

Thank you for updating. I think the patches has enough discussion and
quality, so can go to the committer's review. I've marked them as
Ready for Committer.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Masahiko Sawada
Date:
On Mon, Mar 5, 2018 at 10:21 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Fri, Mar 2, 2018 at 10:50 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> On Fri, Mar 2, 2018 at 10:47 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
>>> On Fri, Mar 2, 2018 at 7:38 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>> Thank you for updating the patches!
>>>>
>>>> +/*
>>>> + * When a table has no indexes, vacuum the FSM at most every this
>>>> + * many dirty pages. With a default page size of 8kb, this value
>>>> + * basically means 8GB of dirtied pages.
>>>> + */
>>>> +#define VACUUM_FSM_EVERY_PAGES 1048576
>>>>
>>>> Is it better if we vacuum fsm every 8GB regardless of the block size?
>>>> Because an installation that uses >8GB block size is likely to have
>>>> the pages less than what an 8GB block size installation has, the
>>>> current patch might lead to delay fsm vacuum. What do you think? If
>>>> it's better, we can define it as follows.
>>>>
>>>> #define VACUUM_FSM_EVERY_PAGES ((8 * 1024 * 1024) / BLCKSZ)
>>>
>>> I honestly don't know. I've never tweaked page size, and know nobody
>>> that did, so I have no experience with those installations.
>>>
>>> Lets go with your proposal.
>>>
>>>>
>>>> --
>>>> @@ -470,7 +484,9 @@ lazy_scan_heap(Relation onerel, int options,
>>>> LVRelStats *vacrelstats,
>>>>      TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid;
>>>>      TransactionId relminmxid = onerel->rd_rel->relminmxid;
>>>>      BlockNumber empty_pages,
>>>> -                vacuumed_pages;
>>>> +                vacuumed_pages,
>>>> +                fsm_updated_pages,
>>>> +                vacuum_fsm_every_pages;
>>>>      double        num_tuples,
>>>>                  tups_vacuumed,
>>>>                  nkeep,
>>>>
>>>> Regarding fsm_updated_pages variable name, I think it's better to be
>>>> freespace_updated_pages because we actually counts the page updated
>>>> its freespace, not fsm.
>>>
>>> Good point.
>>>
>>> New version attached.
>>
>> Sorry, forgot to actually squash. Now the real new version is attached.
>
> Thank you for updating. I think the patches has enough discussion and
> quality, so can go to the committer's review. I've marked them as
> Ready for Committer.
>

Oops, the following comment needs to be updated. Since it would be
better to defer decision of this behavior to committers I'll keep the
status of this patch as Ready for Committer behavior.

+/*
+ * When a table has no indexes, vacuum the FSM at most every this
+ * many dirty pages. With a default page size of 8kb, this value
+ * basically means 8GB of dirtied pages.
+ */
+#define VACUUM_FSM_EVERY_PAGES ((8 * 1024 * 1024) / BLCKSZ)
+

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Claudio Freire
Date:
On Sun, Mar 4, 2018 at 10:25 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Mon, Mar 5, 2018 at 10:21 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Fri, Mar 2, 2018 at 10:50 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>>> On Fri, Mar 2, 2018 at 10:47 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
>>>> On Fri, Mar 2, 2018 at 7:38 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>>> Thank you for updating the patches!
>>>>>
>>>>> +/*
>>>>> + * When a table has no indexes, vacuum the FSM at most every this
>>>>> + * many dirty pages. With a default page size of 8kb, this value
>>>>> + * basically means 8GB of dirtied pages.
>>>>> + */
>>>>> +#define VACUUM_FSM_EVERY_PAGES 1048576
>>>>>
>>>>> Is it better if we vacuum fsm every 8GB regardless of the block size?
>>>>> Because an installation that uses >8GB block size is likely to have
>>>>> the pages less than what an 8GB block size installation has, the
>>>>> current patch might lead to delay fsm vacuum. What do you think? If
>>>>> it's better, we can define it as follows.
>>>>>
>>>>> #define VACUUM_FSM_EVERY_PAGES ((8 * 1024 * 1024) / BLCKSZ)
>>>>
>>>> I honestly don't know. I've never tweaked page size, and know nobody
>>>> that did, so I have no experience with those installations.
>>>>
>>>> Lets go with your proposal.
>>>>
>>>>>
>>>>> --
>>>>> @@ -470,7 +484,9 @@ lazy_scan_heap(Relation onerel, int options,
>>>>> LVRelStats *vacrelstats,
>>>>>      TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid;
>>>>>      TransactionId relminmxid = onerel->rd_rel->relminmxid;
>>>>>      BlockNumber empty_pages,
>>>>> -                vacuumed_pages;
>>>>> +                vacuumed_pages,
>>>>> +                fsm_updated_pages,
>>>>> +                vacuum_fsm_every_pages;
>>>>>      double        num_tuples,
>>>>>                  tups_vacuumed,
>>>>>                  nkeep,
>>>>>
>>>>> Regarding fsm_updated_pages variable name, I think it's better to be
>>>>> freespace_updated_pages because we actually counts the page updated
>>>>> its freespace, not fsm.
>>>>
>>>> Good point.
>>>>
>>>> New version attached.
>>>
>>> Sorry, forgot to actually squash. Now the real new version is attached.
>>
>> Thank you for updating. I think the patches has enough discussion and
>> quality, so can go to the committer's review. I've marked them as
>> Ready for Committer.
>>
>
> Oops, the following comment needs to be updated. Since it would be
> better to defer decision of this behavior to committers I'll keep the
> status of this patch as Ready for Committer behavior.
>
> +/*
> + * When a table has no indexes, vacuum the FSM at most every this
> + * many dirty pages. With a default page size of 8kb, this value
> + * basically means 8GB of dirtied pages.
> + */
> +#define VACUUM_FSM_EVERY_PAGES ((8 * 1024 * 1024) / BLCKSZ)
> +

I just noticed that it actually computes 8MB (not GB) of pages.

Attached a fix for that and the comment update.

Attachment

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Tom Lane
Date:
Claudio Freire <klaussfreire@gmail.com> writes:
> [ 0001-Vacuum-Update-FSM-more-frequently-v9.patch ]

I hadn't paid any attention to this patch previously, so maybe I'm
missing something ... but this sure seems like a very bizarre approach
to the problem.  If the idea is to fix the FSM's upper levels after
vacuuming a known sub-range of the table, why would you not proceed
by teaching FreeSpaceMapVacuum to recurse only within that sub-range
of page numbers?  This setup with a threshold seems entirely Rube
Goldbergian.  It's dependent on a magic threshold number that you can
only select by trial and error, and it's inevitably going to spend time
updating segments of the FSM tree that have nothing to do with the part
that's been vacuumed.

This approach would also offer a less arbitrary way to decide how often
to do the updates: choose a distance that has something to do with the
FSM's structure, so that we don't waste effort reconsidering fragments
of an upper tree page.

            regards, tom lane


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Amit Kapila
Date:
On Sun, Mar 25, 2018 at 12:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Claudio Freire <klaussfreire@gmail.com> writes:
>> [ 0001-Vacuum-Update-FSM-more-frequently-v9.patch ]
>
> I hadn't paid any attention to this patch previously, so maybe I'm
> missing something ... but this sure seems like a very bizarre approach
> to the problem.  If the idea is to fix the FSM's upper levels after
> vacuuming a known sub-range of the table, why would you not proceed
> by teaching FreeSpaceMapVacuum to recurse only within that sub-range
> of page numbers?
>

Yeah, sounds like a better idea and I think we already do something
similar during relation extension (when we add blocks in bulk) via
UpdateFreeSpaceMap.  Is that what you have in mind? It seems like
Claudio's method is somewhat more selective when deciding which
branches to traverse, but it appears complicated as compared to what
you are proposing.


  This setup with a threshold seems entirely Rube
> Goldbergian.  It's dependent on a magic threshold number that you can
> only select by trial and error, and it's inevitably going to spend time
> updating segments of the FSM tree that have nothing to do with the part
> that's been vacuumed.
>
> This approach would also offer a less arbitrary way to decide how often
> to do the updates: choose a distance that has something to do with the
> FSM's structure, so that we don't waste effort reconsidering fragments
> of an upper tree page.
>

+1.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Claudio Freire
Date:
On Sat, Mar 24, 2018 at 4:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Claudio Freire <klaussfreire@gmail.com> writes:
>> [ 0001-Vacuum-Update-FSM-more-frequently-v9.patch ]
>
> I hadn't paid any attention to this patch previously, so maybe I'm
> missing something ... but this sure seems like a very bizarre approach
> to the problem.  If the idea is to fix the FSM's upper levels after
> vacuuming a known sub-range of the table, why would you not proceed
> by teaching FreeSpaceMapVacuum to recurse only within that sub-range
> of page numbers?  This setup with a threshold seems entirely Rube
> Goldbergian.  It's dependent on a magic threshold number that you can
> only select by trial and error, and it's inevitably going to spend time
> updating segments of the FSM tree that have nothing to do with the part
> that's been vacuumed.

Well, the point is to not only update the range we know we've
vacuumed, but also to finish the updates done by a potential
previously cancelled autovacuum.

But I guess that's only for the first run. Your approach seems like a
better idea for the other runs.

Just FTR, the number isn't so magic. During vacuum, the patch records
the highest amount of free space produced, and will only recurse into
branches that don't already record that much free space. So it's quite
deterministic in those cases, it's only the first run the one that has
to guess, and your approach doesn't apply for that first run.


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Tom Lane
Date:
Claudio Freire <klaussfreire@gmail.com> writes:
> On Sat, Mar 24, 2018 at 4:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I hadn't paid any attention to this patch previously, so maybe I'm
>> missing something ... but this sure seems like a very bizarre approach
>> to the problem.  If the idea is to fix the FSM's upper levels after
>> vacuuming a known sub-range of the table, why would you not proceed
>> by teaching FreeSpaceMapVacuum to recurse only within that sub-range
>> of page numbers?  This setup with a threshold seems entirely Rube
>> Goldbergian.  It's dependent on a magic threshold number that you can
>> only select by trial and error, and it's inevitably going to spend time
>> updating segments of the FSM tree that have nothing to do with the part
>> that's been vacuumed.

> Well, the point is to not only update the range we know we've
> vacuumed, but also to finish the updates done by a potential
> previously cancelled autovacuum.

I think that's not an important consideration, or at least would stop
being one after a patch like this.  The reason it's a problem now is
precisely that we don't try to vacuum the FSM till the very end; if
we did FSM cleanup every X pages --- in particular, before not after
the final relation-truncation attempt --- there wouldn't be risk of
skipping so much FSM work that we need to worry about forcing the
issue just in case there'd been a prior cancellation.

(Of course, you'd still need to do something after the truncation
step to truncate the FSM, but I'm arguing it should *only* respond
to that, not have to clean up all the rest of the FSM state.)

            regards, tom lane


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Claudio Freire
Date:
On Mon, Mar 26, 2018 at 11:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Claudio Freire <klaussfreire@gmail.com> writes:
>> On Sat, Mar 24, 2018 at 4:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I hadn't paid any attention to this patch previously, so maybe I'm
>>> missing something ... but this sure seems like a very bizarre approach
>>> to the problem.  If the idea is to fix the FSM's upper levels after
>>> vacuuming a known sub-range of the table, why would you not proceed
>>> by teaching FreeSpaceMapVacuum to recurse only within that sub-range
>>> of page numbers?  This setup with a threshold seems entirely Rube
>>> Goldbergian.  It's dependent on a magic threshold number that you can
>>> only select by trial and error, and it's inevitably going to spend time
>>> updating segments of the FSM tree that have nothing to do with the part
>>> that's been vacuumed.
>
>> Well, the point is to not only update the range we know we've
>> vacuumed, but also to finish the updates done by a potential
>> previously cancelled autovacuum.
>
> I think that's not an important consideration, or at least would stop
> being one after a patch like this.  The reason it's a problem now is
> precisely that we don't try to vacuum the FSM till the very end; if
> we did FSM cleanup every X pages --- in particular, before not after
> the final relation-truncation attempt --- there wouldn't be risk of
> skipping so much FSM work that we need to worry about forcing the
> issue just in case there'd been a prior cancellation.

I'm thinking that in conjunction with the high MWM patch for vacuum,
it could still happen that considerable amount of vacuuming is left
unexposed upon cancellation.

The "bizarre" approach provides some relief.

I'll see about implementing the FSM range vacuuming operation for
non-initial runs, there seems to be consensus that it's a good idea.

But I still think this partial run at the very beginning is useful still.


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Claudio Freire
Date:
On Mon, Mar 26, 2018 at 11:26 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Mon, Mar 26, 2018 at 11:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Claudio Freire <klaussfreire@gmail.com> writes:
>>> On Sat, Mar 24, 2018 at 4:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> I hadn't paid any attention to this patch previously, so maybe I'm
>>>> missing something ... but this sure seems like a very bizarre approach
>>>> to the problem.  If the idea is to fix the FSM's upper levels after
>>>> vacuuming a known sub-range of the table, why would you not proceed
>>>> by teaching FreeSpaceMapVacuum to recurse only within that sub-range
>>>> of page numbers?  This setup with a threshold seems entirely Rube
>>>> Goldbergian.  It's dependent on a magic threshold number that you can
>>>> only select by trial and error, and it's inevitably going to spend time
>>>> updating segments of the FSM tree that have nothing to do with the part
>>>> that's been vacuumed.
>>
>>> Well, the point is to not only update the range we know we've
>>> vacuumed, but also to finish the updates done by a potential
>>> previously cancelled autovacuum.
>>
>> I think that's not an important consideration, or at least would stop
>> being one after a patch like this.  The reason it's a problem now is
>> precisely that we don't try to vacuum the FSM till the very end; if
>> we did FSM cleanup every X pages --- in particular, before not after
>> the final relation-truncation attempt --- there wouldn't be risk of
>> skipping so much FSM work that we need to worry about forcing the
>> issue just in case there'd been a prior cancellation.
>
> I'm thinking that in conjunction with the high MWM patch for vacuum,
> it could still happen that considerable amount of vacuuming is left
> unexposed upon cancellation.
>
> The "bizarre" approach provides some relief.
>
> I'll see about implementing the FSM range vacuuming operation for
> non-initial runs, there seems to be consensus that it's a good idea.
>
> But I still think this partial run at the very beginning is useful still.

Attached patches, rebased and modified as discussed:

1 no longer does tree pruning, it just vacuums a range of the FSM

2 reintroduces tree pruning for the initial FSM vacuum

3 and 4 remain as they were, but rebased

Attachment

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Claudio Freire
Date:
On Mon, Mar 26, 2018 at 2:46 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Mon, Mar 26, 2018 at 11:26 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> On Mon, Mar 26, 2018 at 11:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Claudio Freire <klaussfreire@gmail.com> writes:
>>>> On Sat, Mar 24, 2018 at 4:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>>> I hadn't paid any attention to this patch previously, so maybe I'm
>>>>> missing something ... but this sure seems like a very bizarre approach
>>>>> to the problem.  If the idea is to fix the FSM's upper levels after
>>>>> vacuuming a known sub-range of the table, why would you not proceed
>>>>> by teaching FreeSpaceMapVacuum to recurse only within that sub-range
>>>>> of page numbers?  This setup with a threshold seems entirely Rube
>>>>> Goldbergian.  It's dependent on a magic threshold number that you can
>>>>> only select by trial and error, and it's inevitably going to spend time
>>>>> updating segments of the FSM tree that have nothing to do with the part
>>>>> that's been vacuumed.
>>>
>>>> Well, the point is to not only update the range we know we've
>>>> vacuumed, but also to finish the updates done by a potential
>>>> previously cancelled autovacuum.
>>>
>>> I think that's not an important consideration, or at least would stop
>>> being one after a patch like this.  The reason it's a problem now is
>>> precisely that we don't try to vacuum the FSM till the very end; if
>>> we did FSM cleanup every X pages --- in particular, before not after
>>> the final relation-truncation attempt --- there wouldn't be risk of
>>> skipping so much FSM work that we need to worry about forcing the
>>> issue just in case there'd been a prior cancellation.
>>
>> I'm thinking that in conjunction with the high MWM patch for vacuum,
>> it could still happen that considerable amount of vacuuming is left
>> unexposed upon cancellation.
>>
>> The "bizarre" approach provides some relief.
>>
>> I'll see about implementing the FSM range vacuuming operation for
>> non-initial runs, there seems to be consensus that it's a good idea.
>>
>> But I still think this partial run at the very beginning is useful still.
>
> Attached patches, rebased and modified as discussed:
>
> 1 no longer does tree pruning, it just vacuums a range of the FSM
>
> 2 reintroduces tree pruning for the initial FSM vacuum
>
> 3 and 4 remain as they were, but rebased

Sorry, ignore patch number 3 in the earlier mail, I selected the wrong
patch to attach.

Attached now is the correct number 3

Attachment

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Tom Lane
Date:
Claudio Freire <klaussfreire@gmail.com> writes:
> Attached patches, rebased and modified as discussed:
> 1 no longer does tree pruning, it just vacuums a range of the FSM
> 2 reintroduces tree pruning for the initial FSM vacuum
> 3 and 4 remain as they were, but rebased

I reviewed and cleaned up 0001.  The API for FreeSpaceMapVacuumRange was
underspecified, and the use of it seemed to have off-by-one errors.  Also,
you still had vacuum doing a full FreeSpaceMapVacuum call at the end;
I thought the point was to get rid of that.  We then need to make sure
we clean up after a truncation, but we can do that by introducing a
FreeSpaceMapVacuumRange call into FreeSpaceMapTruncateRel.  I think the
attached 0001 is committable, but feel free to take another look.

I still don't like 0002.  It's adding a lot of complexity, and
not-negligible overhead, to solve yesterday's problem.  After 0001,
there's no reason to assume that vacuum is particularly likely to get
cancelled between having made cleanups and having updated the upper FSM
levels.  (Maybe the odds are a bit more for the no-indexes case, but
that doesn't seem like it's common enough to justify a special mechanism
either.)

Not sure what to think about 0003.  At this point I'd be inclined
to flush UpdateFreeSpaceMap altogether and use FreeSpaceMapVacuumRange
in its place.  I don't think the design of that function is any better
chosen than its name, and possible bugs in its subroutines don't make
it more attractive.

Not sure about 0004 either.  The fact that we can't localize what part of
the index needs to be updated means that repeated IndexFreeSpaceMapVacuum
calls are a roughly quadratic cost.  Maybe in proportion to the other work
we have to do, they're not much, but on the other hand how much benefit is
there?  Should we make the call conditional on how many index pages got
updated?  Also, I wonder why you only touched nbtree and spgist.  (For
that matter, I wonder why BRIN doesn't go through IndexFreeSpaceMapVacuum
like the rest of the index AMs.  Or perhaps it has the right idea and we
should get rid of IndexFreeSpaceMapVacuum as a useless layer.)

            regards, tom lane

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index f9da24c..d2a0066 100644
*** a/src/backend/commands/vacuumlazy.c
--- b/src/backend/commands/vacuumlazy.c
***************
*** 85,90 ****
--- 85,99 ----
  #define VACUUM_TRUNCATE_LOCK_TIMEOUT            5000    /* ms */

  /*
+  * When a table has no indexes, vacuum the FSM after every 8GB, approximately
+  * (it won't be exact because we only vacuum FSM after processing a heap page
+  * that has some removable tuples).  When there are indexes, this is ignored,
+  * and we vacuum FSM after each index/heap cleaning pass.
+  */
+ #define VACUUM_FSM_EVERY_PAGES \
+     ((BlockNumber) (((uint64) 8 * 1024 * 1024 * 1024) / BLCKSZ))
+
+ /*
   * Guesstimation of number of dead tuples per page.  This is used to
   * provide an upper limit to memory allocated when vacuuming small
   * tables.
*************** lazy_vacuum_rel(Relation onerel, int opt
*** 285,293 ****
      pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
                                   PROGRESS_VACUUM_PHASE_FINAL_CLEANUP);

-     /* Vacuum the Free Space Map */
-     FreeSpaceMapVacuum(onerel);
-
      /*
       * Update statistics in pg_class.
       *
--- 294,299 ----
*************** lazy_scan_heap(Relation onerel, int opti
*** 465,471 ****
      TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid;
      TransactionId relminmxid = onerel->rd_rel->relminmxid;
      BlockNumber empty_pages,
!                 vacuumed_pages;
      double        num_tuples,        /* total number of nonremovable tuples */
                  live_tuples,    /* live tuples (reltuples estimate) */
                  tups_vacuumed,    /* tuples cleaned up by vacuum */
--- 471,478 ----
      TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid;
      TransactionId relminmxid = onerel->rd_rel->relminmxid;
      BlockNumber empty_pages,
!                 vacuumed_pages,
!                 next_fsm_block_to_vacuum;
      double        num_tuples,        /* total number of nonremovable tuples */
                  live_tuples,    /* live tuples (reltuples estimate) */
                  tups_vacuumed,    /* tuples cleaned up by vacuum */
*************** lazy_scan_heap(Relation onerel, int opti
*** 501,506 ****
--- 508,514 ----
                          relname)));

      empty_pages = vacuumed_pages = 0;
+     next_fsm_block_to_vacuum = (BlockNumber) 0;
      num_tuples = live_tuples = tups_vacuumed = nkeep = nunused = 0;

      indstats = (IndexBulkDeleteResult **)
*************** lazy_scan_heap(Relation onerel, int opti
*** 752,757 ****
--- 760,772 ----
              vacrelstats->num_dead_tuples = 0;
              vacrelstats->num_index_scans++;

+             /*
+              * Vacuum the Free Space Map to make newly-freed space visible on
+              * upper-level FSM pages.  Note we have not yet processed blkno.
+              */
+             FreeSpaceMapVacuumRange(onerel, next_fsm_block_to_vacuum, blkno);
+             next_fsm_block_to_vacuum = blkno;
+
              /* Report that we are once again scanning the heap */
              pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
                                           PROGRESS_VACUUM_PHASE_SCAN_HEAP);
*************** lazy_scan_heap(Relation onerel, int opti
*** 1200,1205 ****
--- 1215,1233 ----
               */
              vacrelstats->num_dead_tuples = 0;
              vacuumed_pages++;
+
+             /*
+              * Periodically do incremental FSM vacuuming to make newly-freed
+              * space visible on upper FSM pages.  Note: although we've cleaned
+              * the current block, we haven't yet updated its FSM entry (that
+              * happens further down), so passing end == blkno is correct.
+              */
+             if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
+             {
+                 FreeSpaceMapVacuumRange(onerel, next_fsm_block_to_vacuum,
+                                         blkno);
+                 next_fsm_block_to_vacuum = blkno;
+             }
          }

          freespace = PageGetHeapFreeSpace(page);
*************** lazy_scan_heap(Relation onerel, int opti
*** 1368,1373 ****
--- 1396,1408 ----
          vacrelstats->num_index_scans++;
      }

+     /*
+      * Vacuum the remainder of the Free Space Map.  We must do this whether or
+      * not there were indexes.
+      */
+     if (blkno > next_fsm_block_to_vacuum)
+         FreeSpaceMapVacuumRange(onerel, next_fsm_block_to_vacuum, blkno);
+
      /* report all blocks vacuumed; and that we're cleaning up */
      pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
      pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
diff --git a/src/backend/storage/freespace/README b/src/backend/storage/freespace/README
index bbd1b93..e7ff23b 100644
*** a/src/backend/storage/freespace/README
--- b/src/backend/storage/freespace/README
*************** have a corrupted page, with a parent som
*** 180,192 ****
  Secondly, if we detect corrupted pages while we search, traversing down
  the tree. That check will notice if a parent node is set to too high a value.
  In both cases, the upper nodes on the page are immediately rebuilt, fixing
! the corruption.

! Vacuum updates all the bottom level pages with correct amount of free space
! on the heap pages, fixing any outdated values there. After the heap and
! index passes are done, FreeSpaceMapVacuum is called, and the FSM tree is
! scanned in depth-first order. This fixes any discrepancies between upper
! and lower level FSM pages.

  TODO
  ----
--- 180,192 ----
  Secondly, if we detect corrupted pages while we search, traversing down
  the tree. That check will notice if a parent node is set to too high a value.
  In both cases, the upper nodes on the page are immediately rebuilt, fixing
! the corruption so far as that page is concerned.

! VACUUM updates all the bottom-level FSM pages with the correct amount of free
! space on corresponding heap pages, as it proceeds through the heap.  This
! goes through fsm_set_avail(), so that the upper nodes on those pages are
! immediately updated.  Periodically, VACUUM calls FreeSpaceMapVacuum[Range]
! to propagate the new free-space info into the upper pages of the FSM tree.

  TODO
  ----
diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index dd8ae52..8ae981f 100644
*** a/src/backend/storage/freespace/freespace.c
--- b/src/backend/storage/freespace/freespace.c
*************** static Size fsm_space_cat_to_avail(uint8
*** 108,114 ****
  static int fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot,
                     uint8 newValue, uint8 minValue);
  static BlockNumber fsm_search(Relation rel, uint8 min_cat);
! static uint8 fsm_vacuum_page(Relation rel, FSMAddress addr, bool *eof);
  static BlockNumber fsm_get_lastblckno(Relation rel, FSMAddress addr);
  static void fsm_update_recursive(Relation rel, FSMAddress addr, uint8 new_cat);

--- 108,116 ----
  static int fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot,
                     uint8 newValue, uint8 minValue);
  static BlockNumber fsm_search(Relation rel, uint8 min_cat);
! static uint8 fsm_vacuum_page(Relation rel, FSMAddress addr,
!                 BlockNumber start, BlockNumber end,
!                 bool *eof);
  static BlockNumber fsm_get_lastblckno(Relation rel, FSMAddress addr);
  static void fsm_update_recursive(Relation rel, FSMAddress addr, uint8 new_cat);

*************** FreeSpaceMapTruncateRel(Relation rel, Bl
*** 370,390 ****
       */
      if (rel->rd_smgr)
          rel->rd_smgr->smgr_fsm_nblocks = new_nfsmblocks;
  }

  /*
!  * FreeSpaceMapVacuum - scan and fix any inconsistencies in the FSM
   */
  void
  FreeSpaceMapVacuum(Relation rel)
  {
      bool        dummy;

!     /*
!      * Traverse the tree in depth-first order. The tree is stored physically
!      * in depth-first order, so this should be pretty I/O efficient.
!      */
!     fsm_vacuum_page(rel, FSM_ROOT_ADDRESS, &dummy);
  }

  /******** Internal routines ********/
--- 372,418 ----
       */
      if (rel->rd_smgr)
          rel->rd_smgr->smgr_fsm_nblocks = new_nfsmblocks;
+
+     /*
+      * Update upper-level FSM pages to account for the truncation.  This is
+      * important because the just-truncated pages were likely marked as
+      * all-free, and would be preferentially selected.
+      */
+     FreeSpaceMapVacuumRange(rel, nblocks, InvalidBlockNumber);
  }

  /*
!  * FreeSpaceMapVacuum - update upper-level pages in the rel's FSM
!  *
!  * We assume that the bottom-level pages have already been updated with
!  * new free-space information.
   */
  void
  FreeSpaceMapVacuum(Relation rel)
  {
      bool        dummy;

!     /* Recursively scan the tree, starting at the root */
!     (void) fsm_vacuum_page(rel, FSM_ROOT_ADDRESS,
!                            (BlockNumber) 0, InvalidBlockNumber,
!                            &dummy);
! }
!
! /*
!  * FreeSpaceMapVacuumRange - update upper-level pages in the rel's FSM
!  *
!  * As above, but assume that only heap pages between start and end-1 inclusive
!  * have new free-space information, so update only the upper-level slots
!  * covering that block range.  end == InvalidBlockNumber is equivalent to
!  * "all the rest of the relation".
!  */
! void
! FreeSpaceMapVacuumRange(Relation rel, BlockNumber start, BlockNumber end)
! {
!     bool        dummy;
!
!     /* Recursively scan the tree, starting at the root */
!     (void) fsm_vacuum_page(rel, FSM_ROOT_ADDRESS, start, end, &dummy);
  }

  /******** Internal routines ********/
*************** fsm_search(Relation rel, uint8 min_cat)
*** 783,791 ****

  /*
   * Recursive guts of FreeSpaceMapVacuum
   */
  static uint8
! fsm_vacuum_page(Relation rel, FSMAddress addr, bool *eof_p)
  {
      Buffer        buf;
      Page        page;
--- 811,831 ----

  /*
   * Recursive guts of FreeSpaceMapVacuum
+  *
+  * Examine the FSM page indicated by addr, as well as its children, updating
+  * upper-level nodes that cover the heap block range from start to end-1.
+  * (It's okay if end is beyond the actual end of the map.)
+  * Return the maximum freespace value on this page.
+  *
+  * If addr is past the end of the FSM, set *eof_p to true and return 0.
+  *
+  * This traverses the tree in depth-first order.  The tree is stored
+  * physically in depth-first order, so this should be pretty I/O efficient.
   */
  static uint8
! fsm_vacuum_page(Relation rel, FSMAddress addr,
!                 BlockNumber start, BlockNumber end,
!                 bool *eof_p)
  {
      Buffer        buf;
      Page        page;
*************** fsm_vacuum_page(Relation rel, FSMAddress
*** 804,818 ****
      page = BufferGetPage(buf);

      /*
!      * Recurse into children, and fix the information stored about them at
!      * this level.
       */
      if (addr.level > FSM_BOTTOM_LEVEL)
      {
!         int            slot;
          bool        eof = false;

!         for (slot = 0; slot < SlotsPerFSMPage; slot++)
          {
              int            child_avail;

--- 844,894 ----
      page = BufferGetPage(buf);

      /*
!      * If we're above the bottom level, recurse into children, and fix the
!      * information stored about them at this level.
       */
      if (addr.level > FSM_BOTTOM_LEVEL)
      {
!         FSMAddress    fsm_start,
!                     fsm_end;
!         uint16        fsm_start_slot,
!                     fsm_end_slot;
!         int            slot,
!                     start_slot,
!                     end_slot;
          bool        eof = false;

!         /*
!          * Compute the range of slots we need to update on this page, given
!          * the requested range of heap blocks to consider.  (Some of this work
!          * will be duplicated in each recursive call, but it's cheap enough to
!          * not worry about.)
!          */
!         fsm_start = fsm_get_location(start, &fsm_start_slot);
!         fsm_end = fsm_get_location(end, &fsm_end_slot);
!
!         while (fsm_start.level < addr.level)
!         {
!             fsm_start = fsm_get_parent(fsm_start, &fsm_start_slot);
!             fsm_end = fsm_get_parent(fsm_end, &fsm_end_slot);
!         }
!         Assert(fsm_start.level == addr.level);
!
!         if (fsm_start.logpageno == addr.logpageno)
!             start_slot = fsm_start_slot;
!         else if (fsm_start.logpageno > addr.logpageno)
!             start_slot = SlotsPerFSMPage;
!         else
!             start_slot = 0;
!
!         if (fsm_end.logpageno == addr.logpageno)
!             end_slot = fsm_end_slot;
!         else if (fsm_end.logpageno > addr.logpageno)
!             end_slot = SlotsPerFSMPage;
!         else
!             end_slot = -1;
!
!         for (slot = start_slot; slot < end_slot; slot++)
          {
              int            child_avail;

*************** fsm_vacuum_page(Relation rel, FSMAddress
*** 820,826 ****

              /* After we hit end-of-file, just clear the rest of the slots */
              if (!eof)
!                 child_avail = fsm_vacuum_page(rel, fsm_get_child(addr, slot), &eof);
              else
                  child_avail = 0;

--- 896,904 ----

              /* After we hit end-of-file, just clear the rest of the slots */
              if (!eof)
!                 child_avail = fsm_vacuum_page(rel, fsm_get_child(addr, slot),
!                                               start, end,
!                                               &eof);
              else
                  child_avail = 0;

*************** fsm_vacuum_page(Relation rel, FSMAddress
*** 835,840 ****
--- 913,919 ----
          }
      }

+     /* Now get the maximum value on the page, to return to caller */
      max_avail = fsm_get_max_avail(BufferGetPage(buf));

      /*
diff --git a/src/include/storage/freespace.h b/src/include/storage/freespace.h
index a517d7e..a203251 100644
*** a/src/include/storage/freespace.h
--- b/src/include/storage/freespace.h
*************** extern void XLogRecordPageWithFreeSpace(
*** 32,37 ****
--- 32,39 ----

  extern void FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks);
  extern void FreeSpaceMapVacuum(Relation rel);
+ extern void FreeSpaceMapVacuumRange(Relation rel, BlockNumber start,
+                         BlockNumber end);
  extern void UpdateFreeSpaceMap(Relation rel,
                     BlockNumber startBlkNum,
                     BlockNumber endBlkNum,

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Claudio Freire
Date:
On Wed, Mar 28, 2018 at 6:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Claudio Freire <klaussfreire@gmail.com> writes:
>> Attached patches, rebased and modified as discussed:
>> 1 no longer does tree pruning, it just vacuums a range of the FSM
>> 2 reintroduces tree pruning for the initial FSM vacuum
>> 3 and 4 remain as they were, but rebased
>
> I reviewed and cleaned up 0001.  The API for FreeSpaceMapVacuumRange was
> underspecified, and the use of it seemed to have off-by-one errors.  Also,
> you still had vacuum doing a full FreeSpaceMapVacuum call at the end;
> I thought the point was to get rid of that.  We then need to make sure
> we clean up after a truncation, but we can do that by introducing a
> FreeSpaceMapVacuumRange call into FreeSpaceMapTruncateRel.  I think the
> attached 0001 is committable, but feel free to take another look.

+
+             /*
+              * Periodically do incremental FSM vacuuming to make newly-freed
+              * space visible on upper FSM pages.  Note: although we've cleaned
+              * the current block, we haven't yet updated its FSM entry (that
+              * happens further down), so passing end == blkno is correct.
+              */
+             if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
+             {
+                 FreeSpaceMapVacuumRange(onerel, next_fsm_block_to_vacuum,
+                                         blkno);
+                 next_fsm_block_to_vacuum = blkno;
+             }
          }

Using next_fsm_block_to_vacuum there seems strange.

v10 counted the number of blocks with updated free space to vacuum the
FSM only after a lot of changes to it were made. This will vacuum the
FSM after *scanning* a lot of pages, even if little modifications were
made to them. Not sure which is better, but the logic in v10 sounded
right to me.

!         if (fsm_end.logpageno == addr.logpageno)
!             end_slot = fsm_end_slot;
!         else if (fsm_end.logpageno > addr.logpageno)
!             end_slot = SlotsPerFSMPage;
!         else
!             end_slot = -1;
!
!         for (slot = start_slot; slot < end_slot; slot++)

This doesn't seem correct.

The +1 in v10 was intentional. Suppose a leaf node of the FSM has a
parent in slot 3, and that one has a parent at slot 10.

This would vacuum slots 0-9 on the upper level, but not 10. That would
leave a whole branch (the one where the end block belongs) unvisited.

That's why the end_slot has to be inclusive of the end block. We have
work to do recursing the end_slot.


> I still don't like 0002.  It's adding a lot of complexity, and
> not-negligible overhead, to solve yesterday's problem.

Are you sure it's non-negligible?

Most of my benchmarks couldn't measure any change whatsoever after
this patch in regards to run/cpu time.

The size of the FSM is so much smaller than the table, that the cost
of vacuuming it is drowned by all the other work done and buried under
the noise.

Maybe under very special cases where vacuum does nothing, skipping all
rows, it might be measurable. A heavily bloated-then-cleaned table
with few live rows per page, but all frozen, that would be a total
worst-case. But reading the visibility map to skip rows is comparable
work to vacuuming the FSM. There's no reason to think it would worsen
by *that* much. I might try to benchmark that a bit after the long
weekend.

Anyway, it's a separate patch to be independently committed/vetted.

> After 0001,
> there's no reason to assume that vacuum is particularly likely to get
> cancelled between having made cleanups and having updated the upper FSM
> levels.  (Maybe the odds are a bit more for the no-indexes case, but
> that doesn't seem like it's common enough to justify a special mechanism
> either.)

Why not?

Any kind of DDL (even those that don't rewrite the heap) would cancel
autovacuum.

You might think DDL isn't common enough to worry about, but I've seen
cases where regular reindex were required to keep index bloat in check
(and were cron'd), and those cancel autovacuum all the time.

> Not sure about 0004 either.  The fact that we can't localize what part of
> the index needs to be updated means that repeated IndexFreeSpaceMapVacuum
> calls are a roughly quadratic cost.

Well, the index bulk delete itself already is a huge elephant-sized
quadratic cost.

The FSM is only the cherry on top.

The updated part can't be localize because it isn't. All blocks could
potentially be changed. Even in correlated indexes, upper levels need
not be physically correlated and would screw with the "vacuum block
range" optimization.

I could try to optimize the case where it's possible, by recording the
first and last blocks modified, but that would be a hugely invasive
patch (it would have to touch a lot of places in btree code).

And index bloat is a very real issue, as bad as heap bloat is.

>  Maybe in proportion to the other work
> we have to do, they're not much, but on the other hand how much benefit is
> there?

A week-long multi-pass vacuum I was forced to do about a month ago
accumulated 400GB of *index bloat* because of this. To put into
context, the index was ~800GB, so close to 50% bloat.

Granted, with this patch it would still accumulate bloat, just not
400GB. Just 400GB / # of passes. It's still a decent improvement I
would say. And perhaps there was something else wrong in my
installation. But the patch would indeed have helped a lot.

>  Should we make the call conditional on how many index pages got
> updated?

It could be attempted.

>  Also, I wonder why you only touched nbtree and spgist.

They're the only ones that used the FSM during bulk deletes, at least
that I could find grepping a bit.

I was a bit surprised to find that hash doesn't use the FSM at all,
but perhaps there's a reason for that.

GIN does all FSM updates at the end during ginvacuumcleanup, so
there's nothing to do between bulkdelete calls.

>  (For
> that matter, I wonder why BRIN doesn't go through IndexFreeSpaceMapVacuum
> like the rest of the index AMs.  Or perhaps it has the right idea and we
> should get rid of IndexFreeSpaceMapVacuum as a useless layer.)

No idea.


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Tom Lane
Date:
Claudio Freire <klaussfreire@gmail.com> writes:
> v10 counted the number of blocks with updated free space to vacuum the
> FSM only after a lot of changes to it were made. This will vacuum the
> FSM after *scanning* a lot of pages, even if little modifications were
> made to them.

Yes, that's exactly the point.  We still need to update the upper pages of
the FSM, regardless of exactly how many pages were changed by VACUUM
underneath.  The pages VACUUM didn't change might still be out of date in
the upper pages, since retail updates from other operations don't
immediately propagate; so there's a fixed amount of work that needs to be
done there and we might as well do it in more-or-less-predictable quanta.
I don't see any point in adding counting overhead/complexity for that.
In fact, I seriously considered just making it update after every
VACUUM_FSM_EVERY_PAGES period, but decided that for the case with indexes,
doing it right after each cleanup cycle was sensible, and then we might as
well make the no-index case look as much like that as we conveniently can.
The no-index case seems vestigial anyway; how often is that really going
to apply?  So spending a lot of complexity on it doesn't seem warranted,
especially when the argument for more complexity is at best dubious.

> This doesn't seem correct.

[ thinks for a bit... ]  Yeah, you're right, we need to round up not down
when determining the last slot to scan on the upper level.

I wonder how hard it is to avoid rounding up when the stop point actually
does fall right at a FSM page boundary.  OTOH, that might not happen often
enough to be worth sweating over.

            regards, tom lane


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Tom Lane
Date:
Claudio Freire <klaussfreire@gmail.com> writes:
> On Wed, Mar 28, 2018 at 6:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> After 0001,
>> there's no reason to assume that vacuum is particularly likely to get
>> cancelled between having made cleanups and having updated the upper FSM
>> levels.  (Maybe the odds are a bit more for the no-indexes case, but
>> that doesn't seem like it's common enough to justify a special mechanism
>> either.)

> Why not?

> Any kind of DDL (even those that don't rewrite the heap) would cancel
> autovacuum.

> You might think DDL isn't common enough to worry about, but I've seen
> cases where regular reindex were required to keep index bloat in check
> (and were cron'd), and those cancel autovacuum all the time.

If you've got a situation where every vacuum gets canceled partway
through, you've got bloat problems regardless, because the heap tuples are
never getting removed in the first place; worrying about whether the FSM
is up to date is pretty pointless.  The 0001 patch basically updates the
FSM as soon as it can after the tuples are actually deleted, so I think
we've made the window as small as practical, and I don't really see a need
to do extra work (and add substantial extra complexity) to deal with
missed cleanup a bit sooner.

People who are dealing with this sort of scenario a lot might be well
advised to reduce autovacuum's maintenance_work_mem, so that the cleanup
cycles happen more often.  That's kind of costly in terms of the number
of index scans, but it reduces the amount of cleanup work that can be
lost to a cancel.

(I'd also argue that a setup such as you describe is very possibly making
things worse not better.  Perhaps the 0001 patch will go some way towards
making it less necessary to do that.)

I've pushed 0001 and a replacement for 0003.  I have to go do something
else right now, but I'll take a closer look at 0004 later.

            regards, tom lane


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Tom Lane
Date:
I wrote:
> I have to go do something
> else right now, but I'll take a closer look at 0004 later.

OK, so after studying 0004, it seems to me that we could do it more
simply as attached; that is, move the IndexFreeSpaceMapVacuum calls
into btvacuumscan/spgvacuumscan, do them only if we found any recyclable
pages, and drop the calls in btvacuumcleanup/spgvacuumcleanup altogether.

The reason I think this is sufficient is that the scans find and record
every reusable page in the index, no matter whether they were recorded
before or not.  Therefore, if we don't find any such pages, there's
nothing useful in the FSM and no particular urgency about making its
upper pages up-to-date.  It's true that if the FSM is actually corrupt,
leaving that to be fixed retail by searches is probably less efficient
than doing an IndexFreeSpaceMapVacuum call would be --- but *only* if
you assume that the problem is just in the upper pages and the leaf
pages are all fine.  That doesn't seem to be a case we should optimize
for.

I realized that the reason BRIN doesn't go through indexfsm.c is
that it's actually interested in less-than-page-size free space.
So it's using the right API.  However, it still looks to me like
we could win something there by replacing FreeSpaceMapVacuum calls
with FreeSpaceMapVacuumRange calls.  I've not wrapped my head around
the logic completely, but it looks like there are several places
where it calls FreeSpaceMapVacuum to bubble up a free-space update
for just a single page.  If so, that's much less efficient than it
could be.

            regards, tom lane

diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 8158508..6fca8e3 100644
*** a/src/backend/access/nbtree/nbtree.c
--- b/src/backend/access/nbtree/nbtree.c
*************** btvacuumcleanup(IndexVacuumInfo *info, I
*** 832,840 ****
          btvacuumscan(info, stats, NULL, NULL, 0);
      }

-     /* Finally, vacuum the FSM */
-     IndexFreeSpaceMapVacuum(info->index);
-
      /*
       * It's quite possible for us to be fooled by concurrent page splits into
       * double-counting some index tuples, so disbelieve any total that exceeds
--- 832,837 ----
*************** btvacuumscan(IndexVacuumInfo *info, Inde
*** 976,981 ****
--- 973,993 ----

      MemoryContextDelete(vstate.pagedelcontext);

+     /*
+      * If we found any recyclable pages (and recorded them in the FSM), then
+      * forcibly update the upper-level FSM pages to ensure that searchers can
+      * find them.  It's possible that the pages were also found during
+      * previous scans and so this is a waste of time, but it's cheap enough
+      * relative to scanning the index that it shouldn't matter much, and
+      * making sure that free pages are available sooner not later seems
+      * worthwhile.
+      *
+      * Note that if no recyclable pages exist, we don't bother vacuuming the
+      * FSM at all.
+      */
+     if (vstate.totFreePages > 0)
+         IndexFreeSpaceMapVacuum(rel);
+
      /* update statistics */
      stats->num_pages = num_pages;
      stats->pages_free = vstate.totFreePages;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 72839cb..a83a4b5 100644
*** a/src/backend/access/spgist/spgvacuum.c
--- b/src/backend/access/spgist/spgvacuum.c
*************** spgvacuumscan(spgBulkDeleteState *bds)
*** 846,851 ****
--- 846,866 ----
      SpGistUpdateMetaPage(index);

      /*
+      * If we found any empty pages (and recorded them in the FSM), then
+      * forcibly update the upper-level FSM pages to ensure that searchers can
+      * find them.  It's possible that the pages were also found during
+      * previous scans and so this is a waste of time, but it's cheap enough
+      * relative to scanning the index that it shouldn't matter much, and
+      * making sure that free pages are available sooner not later seems
+      * worthwhile.
+      *
+      * Note that if no empty pages exist, we don't bother vacuuming the FSM at
+      * all.
+      */
+     if (bds->stats->pages_deleted > 0)
+         IndexFreeSpaceMapVacuum(index);
+
+     /*
       * Truncate index if possible
       *
       * XXX disabled because it's unsafe due to possible concurrent inserts.
*************** dummy_callback(ItemPointer itemptr, void
*** 916,922 ****
  IndexBulkDeleteResult *
  spgvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
  {
-     Relation    index = info->index;
      spgBulkDeleteState bds;

      /* No-op in ANALYZE ONLY mode */
--- 931,936 ----
*************** spgvacuumcleanup(IndexVacuumInfo *info,
*** 926,933 ****
      /*
       * We don't need to scan the index if there was a preceding bulkdelete
       * pass.  Otherwise, make a pass that won't delete any live tuples, but
!      * might still accomplish useful stuff with redirect/placeholder cleanup,
!      * and in any case will provide stats.
       */
      if (stats == NULL)
      {
--- 940,947 ----
      /*
       * We don't need to scan the index if there was a preceding bulkdelete
       * pass.  Otherwise, make a pass that won't delete any live tuples, but
!      * might still accomplish useful stuff with redirect/placeholder cleanup
!      * and/or FSM housekeeping, and in any case will provide stats.
       */
      if (stats == NULL)
      {
*************** spgvacuumcleanup(IndexVacuumInfo *info,
*** 940,948 ****
          spgvacuumscan(&bds);
      }

-     /* Finally, vacuum the FSM */
-     IndexFreeSpaceMapVacuum(index);
-
      /*
       * It's quite possible for us to be fooled by concurrent tuple moves into
       * double-counting some index tuples, so disbelieve any total that exceeds
--- 954,959 ----

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Claudio Freire
Date:
On Thu, Mar 29, 2018 at 7:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> I have to go do something
>> else right now, but I'll take a closer look at 0004 later.
>
> OK, so after studying 0004, it seems to me that we could do it more
> simply as attached; that is, move the IndexFreeSpaceMapVacuum calls
> into btvacuumscan/spgvacuumscan, do them only if we found any recyclable
> pages, and drop the calls in btvacuumcleanup/spgvacuumcleanup altogether.
>
> The reason I think this is sufficient is that the scans find and record
> every reusable page in the index, no matter whether they were recorded
> before or not.  Therefore, if we don't find any such pages, there's
> nothing useful in the FSM and no particular urgency about making its
> upper pages up-to-date.  It's true that if the FSM is actually corrupt,
> leaving that to be fixed retail by searches is probably less efficient
> than doing an IndexFreeSpaceMapVacuum call would be --- but *only* if
> you assume that the problem is just in the upper pages and the leaf
> pages are all fine.  That doesn't seem to be a case we should optimize
> for.

+1, LGTM.


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Claudio Freire
Date:
On Thu, Mar 29, 2018 at 2:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Claudio Freire <klaussfreire@gmail.com> writes:
>> On Wed, Mar 28, 2018 at 6:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> After 0001,
>>> there's no reason to assume that vacuum is particularly likely to get
>>> cancelled between having made cleanups and having updated the upper FSM
>>> levels.  (Maybe the odds are a bit more for the no-indexes case, but
>>> that doesn't seem like it's common enough to justify a special mechanism
>>> either.)
>
>> Why not?
>
>> Any kind of DDL (even those that don't rewrite the heap) would cancel
>> autovacuum.
>
>> You might think DDL isn't common enough to worry about, but I've seen
>> cases where regular reindex were required to keep index bloat in check
>> (and were cron'd), and those cancel autovacuum all the time.
>
> If you've got a situation where every vacuum gets canceled partway
> through, you've got bloat problems regardless, because the heap tuples are
> never getting removed in the first place; worrying about whether the FSM
> is up to date is pretty pointless.  The 0001 patch basically updates the
> FSM as soon as it can after the tuples are actually deleted, so I think
> we've made the window as small as practical, and I don't really see a need
> to do extra work (and add substantial extra complexity) to deal with
> missed cleanup a bit sooner.
>
> People who are dealing with this sort of scenario a lot might be well
> advised to reduce autovacuum's maintenance_work_mem, so that the cleanup
> cycles happen more often.  That's kind of costly in terms of the number
> of index scans, but it reduces the amount of cleanup work that can be
> lost to a cancel.
>
> (I'd also argue that a setup such as you describe is very possibly making
> things worse not better.  Perhaps the 0001 patch will go some way towards
> making it less necessary to do that.)

Alright, so we just drop 2.


Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From
Claudio Freire
Date:
On Tue, Apr 3, 2018 at 10:19 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Thu, Mar 29, 2018 at 2:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Claudio Freire <klaussfreire@gmail.com> writes:
>>> On Wed, Mar 28, 2018 at 6:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> After 0001,
>>>> there's no reason to assume that vacuum is particularly likely to get
>>>> cancelled between having made cleanups and having updated the upper FSM
>>>> levels.  (Maybe the odds are a bit more for the no-indexes case, but
>>>> that doesn't seem like it's common enough to justify a special mechanism
>>>> either.)
>>
>>> Why not?
>>
>>> Any kind of DDL (even those that don't rewrite the heap) would cancel
>>> autovacuum.
>>
>>> You might think DDL isn't common enough to worry about, but I've seen
>>> cases where regular reindex were required to keep index bloat in check
>>> (and were cron'd), and those cancel autovacuum all the time.
>>
>> If you've got a situation where every vacuum gets canceled partway
>> through, you've got bloat problems regardless, because the heap tuples are
>> never getting removed in the first place; worrying about whether the FSM
>> is up to date is pretty pointless.  The 0001 patch basically updates the
>> FSM as soon as it can after the tuples are actually deleted, so I think
>> we've made the window as small as practical, and I don't really see a need
>> to do extra work (and add substantial extra complexity) to deal with
>> missed cleanup a bit sooner.
>>
>> People who are dealing with this sort of scenario a lot might be well
>> advised to reduce autovacuum's maintenance_work_mem, so that the cleanup
>> cycles happen more often.  That's kind of costly in terms of the number
>> of index scans, but it reduces the amount of cleanup work that can be
>> lost to a cancel.
>>
>> (I'd also argue that a setup such as you describe is very possibly making
>> things worse not better.  Perhaps the 0001 patch will go some way towards
>> making it less necessary to do that.)
>
> Alright, so we just drop 2.

So, that's it then.

Thanks