Re: Unhappy about API changes in the no-fsm-for-small-rels patch - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Unhappy about API changes in the no-fsm-for-small-rels patch
Date
Msg-id CAA4eK1Kf6dDNkgDHJM0q5KsvdUMzcYgB5-jpmR0ity6ug4KXBA@mail.gmail.com
Whole thread Raw
In response to Re: Unhappy about API changes in the no-fsm-for-small-rels patch  (Andres Freund <andres@anarazel.de>)
Responses Re: Unhappy about API changes in the no-fsm-for-small-rels patch
Re: Unhappy about API changes in the no-fsm-for-small-rels patch
Re: Unhappy about API changes in the no-fsm-for-small-rels patch
List pgsql-hackers
On Tue, Apr 23, 2019 at 10:59 PM Andres Freund <andres@anarazel.de> wrote:
> > > On 2019-04-22 18:49:44 +0530, Amit Kapila wrote:
> > > >  /*
> > > > @@ -1132,9 +1110,6 @@ fsm_local_set(Relation rel, BlockNumber cur_nblocks)
> > > >       /* Set the status of the cached target block to 'unavailable'. */
> > > >       cached_target_block = RelationGetTargetBlock(rel);
> > > >       if (cached_target_block != InvalidBlockNumber &&
> > > >               cached_target_block < cur_nblocks)
> > > > -             fsm_local_map.map[cached_target_block] = FSM_LOCAL_NOT_AVAIL;
> > > > +             rel->fsm_local_map->map[cached_target_block] = FSM_LOCAL_NOT_AVAIL;
> > > >  }
> > >
> > > I think there shouldn't be any need for this anymore. After this change
> > > we shouldn't invalidate the map until there's no space on it - thereby
> > > addressing the cost overhead, and greatly reducing the likelihood that
> > > the local FSM can lead to increased bloat.
>

I have removed the code that was invalidating cached target block from
the above function.

> > If we invalidate it only when there's no space on the page, then when
> > should we set it back to available, because if we don't do that, then
> > we might miss the space due to concurrent deletes.
>
> Well, deletes don't traditionally (i.e. with an actual FSM) mark free
> space as available (for heap). I think RecordPageWithFreeSpace() should
> issue a invalidation if there's no FSM, and the block goes from full to
> empty (as there's no half-full IIUC).
>

Sure, we can do that.

> > > >  /*
> > > > @@ -1168,18 +1143,18 @@ fsm_local_set(Relation rel, BlockNumber cur_nblocks)
> > > >   * This function is used when there is no FSM.
> > > >   */
> > > >  static BlockNumber
> > > > -fsm_local_search(void)
> > > > +fsm_local_search(Relation rel)
> > > >  {
> > > >       BlockNumber target_block;
> > > >
> > > >       /* Local map must be set by now. */
> > > > -     Assert(FSM_LOCAL_MAP_EXISTS);
> > > > +     Assert(FSM_LOCAL_MAP_EXISTS(rel));
> > > >
> > > > -     target_block = fsm_local_map.nblocks;
> > > > +     target_block = rel->fsm_local_map->nblocks;
> > > >       do
> > > >       {
> > > >               target_block--;
> > > > -             if (fsm_local_map.map[target_block] == FSM_LOCAL_AVAIL)
> > > > +             if (rel->fsm_local_map->map[target_block] == FSM_LOCAL_AVAIL)
> > > >                       return target_block;
> > > >       } while (target_block > 0);
> > > >
> > > > @@ -1189,7 +1164,22 @@ fsm_local_search(void)
> > > >        * first, which would otherwise lead to the same conclusion again and
> > > >        * again.
> > > >        */
> > > > -     FSMClearLocalMap();
> > > > +     fsm_clear_local_map(rel);
> > >
> > > I'm not sure I like this. My inclination would be that we should be able
> > > to check the local fsm repeatedly even if there's no space in the
> > > in-memory representation - otherwise the use of the local FSM increases
> > > bloat.
> > >
> >
> > Do you mean to say that we always check all the pages (say 4)
> > irrespective of their state in the local map?
>
> I was wondering that, yes. But I think just issuing invalidations is the
> right approach instead, see above.
>

Righ issuing invalidations can help with that.

>
> > I think we should first try to see in this new scheme (a) when to set
> > the map, (b) when to clear it, (c) how to use.  I have tried to
> > summarize my thoughts about it, let me know what do you think about
> > the same?
> >
> > When to set the map.
> > At the beginning (the first time relation is used in the backend), the
> > map will be clear.  When the first time in the backend, we find that
> > FSM doesn't exist and the number of blocks is lesser than
> > HEAP_FSM_CREATION_THRESHOLD, we set the map for the total blocks that
> > exist at that time and mark all or alternate blocks as available.
>
> I think the alternate blocks scheme has to go. It's not defensible.
>

Fair enough, I have changed it in the attached patch.  However, I
think we should test it once the patch is ready as we have seen a
small performance regression due to that.

> And sure, leaving that aside we could store one byte per block

Hmm, I think you mean to say one-bit per block, right?

> - it's
> just not what the patch has done so far (or rather, it used one byte per
> block, but only utilized one bit of it).

Right, I think this is an independently useful improvement, provided
it doesn't have any additional overhead or complexity.

>  It's possible that'd come with
> some overhead - I've not thought sufficiently about it: I assume we'd
> still start out in each backend assuming each page is empty, and we'd
> then rely on RelationGetBufferForTuple() to update that. What I wonder
> is if we'd need to check if an on-disk FSM has been created every time
> the space on a page is reduced?  I think not, if we use invalidations to
> notify others of the existance of an on-disk FSM. There's a small race,
> but that seems ok.
>

Do you mean to say that vacuum or some backend should invalidate in
case it first time creates the FSM for a relation? I think it is quite
possible that the vacuum takes time to trigger such invalidation if
there are fewer deletions.  And we won't be able to use newly added
page/s.

IIUC, you are suggesting to issue invalidations when the (a) vacuum
finds there is no FSM and page has more space now, (b) invalidation to
notify the existence of FSM.  IT seems to me that issuing more
invalidations for the purpose of FSM can lead to an increased number
of relation builds in the overall system.  I think this can have an
impact when there are many small relations in the system which in some
scenarios might not be uncommon.

The two improvements in this code which are discussed in this thread
and can be done independently to this patch are:
a. use one bit to represent each block in the map.  This gives us the
flexibility to use the map for the different threshold for some other
storage.
b. improve the usage of smgrexists by checking smgr_fsm_nblocks.

John, can you implement these two improvements either on HEAD or on
top of this patch?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Paul Guo
Date:
Subject: Re: [Patch] Check file type before calling AllocateFile() for filesout of pg data directory to avoid potential issues (e.g. hang).
Next
From: David Fetter
Date:
Subject: Re: [PATCH v1] Show whether tables are logged in \dt+