Re: Unhappy about API changes in the no-fsm-for-small-rels patch - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Unhappy about API changes in the no-fsm-for-small-rels patch |
Date | |
Msg-id | CAA4eK1Kf6dDNkgDHJM0q5KsvdUMzcYgB5-jpmR0ity6ug4KXBA@mail.gmail.com Whole thread Raw |
In response to | Re: Unhappy about API changes in the no-fsm-for-small-rels patch (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Unhappy about API changes in the no-fsm-for-small-rels patch
Re: Unhappy about API changes in the no-fsm-for-small-rels patch Re: Unhappy about API changes in the no-fsm-for-small-rels patch |
List | pgsql-hackers |
On Tue, Apr 23, 2019 at 10:59 PM Andres Freund <andres@anarazel.de> wrote: > > > On 2019-04-22 18:49:44 +0530, Amit Kapila wrote: > > > > /* > > > > @@ -1132,9 +1110,6 @@ fsm_local_set(Relation rel, BlockNumber cur_nblocks) > > > > /* Set the status of the cached target block to 'unavailable'. */ > > > > cached_target_block = RelationGetTargetBlock(rel); > > > > if (cached_target_block != InvalidBlockNumber && > > > > cached_target_block < cur_nblocks) > > > > - fsm_local_map.map[cached_target_block] = FSM_LOCAL_NOT_AVAIL; > > > > + rel->fsm_local_map->map[cached_target_block] = FSM_LOCAL_NOT_AVAIL; > > > > } > > > > > > I think there shouldn't be any need for this anymore. After this change > > > we shouldn't invalidate the map until there's no space on it - thereby > > > addressing the cost overhead, and greatly reducing the likelihood that > > > the local FSM can lead to increased bloat. > I have removed the code that was invalidating cached target block from the above function. > > If we invalidate it only when there's no space on the page, then when > > should we set it back to available, because if we don't do that, then > > we might miss the space due to concurrent deletes. > > Well, deletes don't traditionally (i.e. with an actual FSM) mark free > space as available (for heap). I think RecordPageWithFreeSpace() should > issue a invalidation if there's no FSM, and the block goes from full to > empty (as there's no half-full IIUC). > Sure, we can do that. > > > > /* > > > > @@ -1168,18 +1143,18 @@ fsm_local_set(Relation rel, BlockNumber cur_nblocks) > > > > * This function is used when there is no FSM. > > > > */ > > > > static BlockNumber > > > > -fsm_local_search(void) > > > > +fsm_local_search(Relation rel) > > > > { > > > > BlockNumber target_block; > > > > > > > > /* Local map must be set by now. */ > > > > - Assert(FSM_LOCAL_MAP_EXISTS); > > > > + Assert(FSM_LOCAL_MAP_EXISTS(rel)); > > > > > > > > - target_block = fsm_local_map.nblocks; > > > > + target_block = rel->fsm_local_map->nblocks; > > > > do > > > > { > > > > target_block--; > > > > - if (fsm_local_map.map[target_block] == FSM_LOCAL_AVAIL) > > > > + if (rel->fsm_local_map->map[target_block] == FSM_LOCAL_AVAIL) > > > > return target_block; > > > > } while (target_block > 0); > > > > > > > > @@ -1189,7 +1164,22 @@ fsm_local_search(void) > > > > * first, which would otherwise lead to the same conclusion again and > > > > * again. > > > > */ > > > > - FSMClearLocalMap(); > > > > + fsm_clear_local_map(rel); > > > > > > I'm not sure I like this. My inclination would be that we should be able > > > to check the local fsm repeatedly even if there's no space in the > > > in-memory representation - otherwise the use of the local FSM increases > > > bloat. > > > > > > > Do you mean to say that we always check all the pages (say 4) > > irrespective of their state in the local map? > > I was wondering that, yes. But I think just issuing invalidations is the > right approach instead, see above. > Righ issuing invalidations can help with that. > > > I think we should first try to see in this new scheme (a) when to set > > the map, (b) when to clear it, (c) how to use. I have tried to > > summarize my thoughts about it, let me know what do you think about > > the same? > > > > When to set the map. > > At the beginning (the first time relation is used in the backend), the > > map will be clear. When the first time in the backend, we find that > > FSM doesn't exist and the number of blocks is lesser than > > HEAP_FSM_CREATION_THRESHOLD, we set the map for the total blocks that > > exist at that time and mark all or alternate blocks as available. > > I think the alternate blocks scheme has to go. It's not defensible. > Fair enough, I have changed it in the attached patch. However, I think we should test it once the patch is ready as we have seen a small performance regression due to that. > And sure, leaving that aside we could store one byte per block Hmm, I think you mean to say one-bit per block, right? > - it's > just not what the patch has done so far (or rather, it used one byte per > block, but only utilized one bit of it). Right, I think this is an independently useful improvement, provided it doesn't have any additional overhead or complexity. > It's possible that'd come with > some overhead - I've not thought sufficiently about it: I assume we'd > still start out in each backend assuming each page is empty, and we'd > then rely on RelationGetBufferForTuple() to update that. What I wonder > is if we'd need to check if an on-disk FSM has been created every time > the space on a page is reduced? I think not, if we use invalidations to > notify others of the existance of an on-disk FSM. There's a small race, > but that seems ok. > Do you mean to say that vacuum or some backend should invalidate in case it first time creates the FSM for a relation? I think it is quite possible that the vacuum takes time to trigger such invalidation if there are fewer deletions. And we won't be able to use newly added page/s. IIUC, you are suggesting to issue invalidations when the (a) vacuum finds there is no FSM and page has more space now, (b) invalidation to notify the existence of FSM. IT seems to me that issuing more invalidations for the purpose of FSM can lead to an increased number of relation builds in the overall system. I think this can have an impact when there are many small relations in the system which in some scenarios might not be uncommon. The two improvements in this code which are discussed in this thread and can be done independently to this patch are: a. use one bit to represent each block in the map. This gives us the flexibility to use the map for the different threshold for some other storage. b. improve the usage of smgrexists by checking smgr_fsm_nblocks. John, can you implement these two improvements either on HEAD or on top of this patch? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: