Thread: Re: non-HOT update not looking at FSM for large tuple update
On Fri, Mar 12, 2021 at 8:45 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
>
> If this case isn't added, the lower else branch will fail to find
> fitting pages for len > maxPaddedFsmRequest tuples; potentially
> extending the relation when there is actually still enough space
> available.
Okay, with that it looks better to go back to using Max().
>
> If this case isn't added, the lower else branch will fail to find
> fitting pages for len > maxPaddedFsmRequest tuples; potentially
> extending the relation when there is actually still enough space
> available.
Okay, with that it looks better to go back to using Max().
Also in v4:
- With the separate constant you suggested, I split up the comment into two parts.
- I've added a regression test to insert.sql similar to your earlier test, but we cannot use vacuum, since in parallel tests there could still be tuples visible to other transactions. It's still possible to test almost-all-free by inserting a small tuple.
- Draft commit message
- Draft commit message
Attachment
On Wed, 17 Mar 2021 at 21:52, John Naylor <john.naylor@enterprisedb.com> wrote: > > On Fri, Mar 12, 2021 at 8:45 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > > > If this case isn't added, the lower else branch will fail to find > > fitting pages for len > maxPaddedFsmRequest tuples; potentially > > extending the relation when there is actually still enough space > > available. > > Okay, with that it looks better to go back to using Max(). > > Also in v4: > > - With the separate constant you suggested, I split up the comment into two parts. > + * The minimum space on a page for it to be considered "empty" regardless > + * of fillfactor. We base this on MaxHeapTupleSize, minus a small amount > + * of slack. We allow slack equal to 1/8 the maximum space that could be > + * taken by line pointers, which is somewhat arbitrary. > + * We want to allow inserting a large tuple into an empty page even if > + * that would violate the fillfactor. Otherwise, we would unnecessarily > + * extend the relation. Instead, ask the FSM for maxPaddedFsmRequest > + * bytes. This will allow it to return a page that is not quite empty > + * because of unused line pointers How about + * Because pages that have no items left can still have space allocated + * to item pointers, we consider pages "empty" for FSM requests when they + * have at most 1/8 of their MaxHeapTuplesPerPage item pointers' space + * allocated. This is a somewhat arbitrary number, but should prevent + * most unnecessary relation extensions due to not finding "empty" pages + * while inserting combinations of large tuples with low fillfactors. + * When the free space to be requested from the FSM is greater than + * maxPaddedFsmRequest, this is considered equivalent to requesting + * insertion on an "empty" page, so instead of failing to find a page + * with more empty space than an "empty" page and extend the relation, + * we try to find a page which is considered "emtpy". This is slightly more verbose, but I think this clarifies the reasoning why we need this a bit better. Feel free to reject or adapt as needed. > - I've added a regression test to insert.sql similar to your earlier test, but we cannot use vacuum, since in paralleltests there could still be tuples visible to other transactions. It's still possible to test almost-all-free by insertinga small tuple. > - Draft commit message Apart from these mainly readability changes in comments, I think this is ready. > -- > John Naylor > EDB: http://www.enterprisedb.com
On Thu, Mar 18, 2021 at 5:30 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
>
> > + * The minimum space on a page for it to be considered "empty" regardless
> > + * of fillfactor. We base this on MaxHeapTupleSize, minus a small amount
> > + * of slack. We allow slack equal to 1/8 the maximum space that could be
> > + * taken by line pointers, which is somewhat arbitrary.
>
> > + * We want to allow inserting a large tuple into an empty page even if
> > + * that would violate the fillfactor. Otherwise, we would unnecessarily
> > + * extend the relation. Instead, ask the FSM for maxPaddedFsmRequest
> > + * bytes. This will allow it to return a page that is not quite empty
> > + * because of unused line pointers
>
> How about
>
> + * Because pages that have no items left can still have space allocated
> + * to item pointers, we consider pages "empty" for FSM requests when they
> + * have at most 1/8 of their MaxHeapTuplesPerPage item pointers' space
> + * allocated. This is a somewhat arbitrary number, but should prevent
> + * most unnecessary relation extensions due to not finding "empty" pages
> + * while inserting combinations of large tuples with low fillfactors.
>
> + * When the free space to be requested from the FSM is greater than
> + * maxPaddedFsmRequest, this is considered equivalent to requesting
> + * insertion on an "empty" page, so instead of failing to find a page
> + * with more empty space than an "empty" page and extend the relation,
> + * we try to find a page which is considered "emtpy".
>
> This is slightly more verbose, but I think this clarifies the
> reasoning why we need this a bit better. Feel free to reject or adapt
> as needed.
I like this in general, but still has some rough edges. I've made another attempt in v5 incorporating your suggestions. Let me know what you think.
--
John Naylor
EDB: http://www.enterprisedb.com
>
> > + * The minimum space on a page for it to be considered "empty" regardless
> > + * of fillfactor. We base this on MaxHeapTupleSize, minus a small amount
> > + * of slack. We allow slack equal to 1/8 the maximum space that could be
> > + * taken by line pointers, which is somewhat arbitrary.
>
> > + * We want to allow inserting a large tuple into an empty page even if
> > + * that would violate the fillfactor. Otherwise, we would unnecessarily
> > + * extend the relation. Instead, ask the FSM for maxPaddedFsmRequest
> > + * bytes. This will allow it to return a page that is not quite empty
> > + * because of unused line pointers
>
> How about
>
> + * Because pages that have no items left can still have space allocated
> + * to item pointers, we consider pages "empty" for FSM requests when they
> + * have at most 1/8 of their MaxHeapTuplesPerPage item pointers' space
> + * allocated. This is a somewhat arbitrary number, but should prevent
> + * most unnecessary relation extensions due to not finding "empty" pages
> + * while inserting combinations of large tuples with low fillfactors.
>
> + * When the free space to be requested from the FSM is greater than
> + * maxPaddedFsmRequest, this is considered equivalent to requesting
> + * insertion on an "empty" page, so instead of failing to find a page
> + * with more empty space than an "empty" page and extend the relation,
> + * we try to find a page which is considered "emtpy".
>
> This is slightly more verbose, but I think this clarifies the
> reasoning why we need this a bit better. Feel free to reject or adapt
> as needed.
I like this in general, but still has some rough edges. I've made another attempt in v5 incorporating your suggestions. Let me know what you think.
--
John Naylor
EDB: http://www.enterprisedb.com
Attachment
On Fri, 19 Mar 2021 at 19:16, John Naylor <john.naylor@enterprisedb.com> wrote: > > On Thu, Mar 18, 2021 at 5:30 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > > > This is slightly more verbose, but I think this clarifies the > > reasoning why we need this a bit better. Feel free to reject or adapt > > as needed. > > I like this in general, but still has some rough edges. I've made another attempt in v5 incorporating your suggestions.Let me know what you think. That is indeed better. I believe this is ready, so I've marked it as RFC in the commitfest application. With regards, Matthias van de Meent.
I gather this is important when large_upd_rate=rate(cross-page update bytes for tuples larger than fillfactor) exceeds small_ins_rate=rate(insert bytes for tuples NOT larger than fillfactor). That is a plausible outcome when inserts are rare, and table bloat then accrues at large_upd_rate-small_ins_rate. I agree this patch improves behavior. Does anyone have a strong opinion on whether to back-patch? I am weakly inclined not to back-patch, because today's behavior might happen to perform better when large_upd_rate-small_ins_rate<0. Besides the usual choices of back-patching or not back-patching, we could back-patch with a stricter threshold. Suppose we accepted pages for larger-than-fillfactor tuples when the pages have at least BLCKSZ-SizeOfPageHeaderData-sizeof(ItemIdData)-MAXALIGN(MAXALIGN(SizeofHeapTupleHeader)+1)+1 bytes of free space. That wouldn't reuse a page containing a one-column tuple, but it would reuse a page having up to eight line pointers. On Fri, Mar 19, 2021 at 02:16:22PM -0400, John Naylor wrote: > --- a/src/backend/access/heap/hio.c > +++ b/src/backend/access/heap/hio.c > @@ -335,11 +335,24 @@ RelationGetBufferForTuple(Relation relation, Size len, > + const Size maxPaddedFsmRequest = MaxHeapTupleSize - > + (MaxHeapTuplesPerPage / 8 * sizeof(ItemIdData)); In evaluating whether this is a good choice of value, I think about the expected page lifecycle. A tuple barely larger than fillfactor (roughly len=1+BLCKSZ*fillfactor/100) will start on a roughly-empty page. As long as the tuple exists, the server will skip that page for inserts. Updates can cause up to floor(99/fillfactor) same-size versions of the tuple to occupy the page simultaneously, creating that many line pointers. At the fillfactor=10 minimum, it's good to accept otherwise-empty pages having at least nine line pointers, so a page can restart the aforementioned lifecycle. Tolerating even more line pointers helps when updates reduce tuple size or when the page was used for smaller tuples before it last emptied. At the BLCKSZ=8192 default, this maxPaddedFsmRequest allows 36 line pointers (good or somewhat high). At the BLCKSZ=1024 minimum, it allows 4 line pointers (low). At the BLCKSZ=32768 maximum, 146 (likely excessive). I'm not concerned about optimizing non-default block sizes, so let's keep your proposal. Comments and the maxPaddedFsmRequest variable name use term "fsm" for things not specific to the FSM. For example, the patch's test case doesn't use the FSM. (That is fine. Ordinarily, RelationGetTargetBlock() furnishes its block. Under CLOBBER_CACHE_ALWAYS, the "try the last page" logic does so. An FSM-using test would contain a VACUUM.) I plan to commit the attached version; compared to v5, it updates comments and renames this variable. Thanks, nm
Attachment
Hi Noah, Thanks for taking a look at this patch. > > In evaluating whether this is a good choice of value, I think about the > expected page lifecycle. A tuple barely larger than fillfactor (roughly > len=1+BLCKSZ*fillfactor/100) will start on a roughly-empty page. As long as > the tuple exists, the server will skip that page for inserts. Updates can cause > up to floor(99/fillfactor) same-size versions of the tuple to occupy the page > simultaneously, creating that many line pointers. At the fillfactor=10 > minimum, it's good to accept otherwise-empty pages having at least nine line > pointers, so a page can restart the aforementioned lifecycle. Tolerating even > more line pointers helps when updates reduce tuple size or when the page > was used for smaller tuples before it last emptied. At the BLCKSZ=8192 > default, this maxPaddedFsmRequest allows 36 line pointers (good or > somewhat high). At the BLCKSZ=1024 minimum, it allows 4 line pointers > (low). At the BLCKSZ=32768 maximum, 146 (likely excessive). I'm not > concerned about optimizing non-default block sizes, so let's keep your > proposal. > Agreed. You briefly mention this already, but the case that caused me to report this was exactly the one where under normalcircumstances each UPDATE would be small. However, in rare cases, the tuple that is updated grows in size to 1k bytes(the specific case we encountered sometimes would under specific circumstances write extra info in a field, which wouldotherwise be NULL). Suppose that this 1k UPDATE does not fit into the current page (so no HOT update), then a new pagewould be created (HEAD behavior). However, it is very likely that the next updates to this same tuple will be the regularsize again. This causes the higher number of line pointers on the page. -Floris
On Sat, Mar 27, 2021 at 3:00 AM Noah Misch <noah@leadboat.com> wrote:
>
> Does anyone have a strong opinion on whether to back-patch? I am weakly
> inclined not to back-patch, because today's behavior might happen to perform
> better when large_upd_rate-small_ins_rate<0.
It's not a clear case. The present behavior is clearly a bug, but only manifests in rare situations. The risk of the fix affecting other situations is not zero, as you mention, but (thinking briefly about this and I could be wrong) the consequences don't seem as big as the reported case of growing table size.
> Besides the usual choices of
> back-patching or not back-patching, we could back-patch with a stricter
> threshold. Suppose we accepted pages for larger-than-fillfactor tuples when
> the pages have at least
> BLCKSZ-SizeOfPageHeaderData-sizeof(ItemIdData)-MAXALIGN(MAXALIGN(SizeofHeapTupleHeader)+1)+1
> bytes of free space. That wouldn't reuse a page containing a one-column
> tuple, but it would reuse a page having up to eight line pointers.
I'm not sure how much that would help in the reported case that started this thread.
> Comments and the maxPaddedFsmRequest variable name use term "fsm" for things
> not specific to the FSM. For example, the patch's test case doesn't use the
> FSM. (That is fine. Ordinarily, RelationGetTargetBlock() furnishes its
> block. Under CLOBBER_CACHE_ALWAYS, the "try the last page" logic does so. An
> FSM-using test would contain a VACUUM.) I plan to commit the attached
> version; compared to v5, it updates comments and renames this variable.
Looks good to me, thanks!
--
John Naylor
EDB: http://www.enterprisedb.com
>
> Does anyone have a strong opinion on whether to back-patch? I am weakly
> inclined not to back-patch, because today's behavior might happen to perform
> better when large_upd_rate-small_ins_rate<0.
It's not a clear case. The present behavior is clearly a bug, but only manifests in rare situations. The risk of the fix affecting other situations is not zero, as you mention, but (thinking briefly about this and I could be wrong) the consequences don't seem as big as the reported case of growing table size.
> Besides the usual choices of
> back-patching or not back-patching, we could back-patch with a stricter
> threshold. Suppose we accepted pages for larger-than-fillfactor tuples when
> the pages have at least
> BLCKSZ-SizeOfPageHeaderData-sizeof(ItemIdData)-MAXALIGN(MAXALIGN(SizeofHeapTupleHeader)+1)+1
> bytes of free space. That wouldn't reuse a page containing a one-column
> tuple, but it would reuse a page having up to eight line pointers.
I'm not sure how much that would help in the reported case that started this thread.
> Comments and the maxPaddedFsmRequest variable name use term "fsm" for things
> not specific to the FSM. For example, the patch's test case doesn't use the
> FSM. (That is fine. Ordinarily, RelationGetTargetBlock() furnishes its
> block. Under CLOBBER_CACHE_ALWAYS, the "try the last page" logic does so. An
> FSM-using test would contain a VACUUM.) I plan to commit the attached
> version; compared to v5, it updates comments and renames this variable.
Looks good to me, thanks!
--
John Naylor
EDB: http://www.enterprisedb.com
On Sat, Mar 27, 2021 at 11:26:47AM -0400, John Naylor wrote: > On Sat, Mar 27, 2021 at 3:00 AM Noah Misch <noah@leadboat.com> wrote: > > Does anyone have a strong opinion on whether to back-patch? I am weakly > > inclined not to back-patch, because today's behavior might happen to perform > > better when large_upd_rate-small_ins_rate<0. > > It's not a clear case. The present behavior is clearly a bug, but only > manifests in rare situations. The risk of the fix affecting other situations > is not zero, as you mention, but (thinking briefly about this and I could be > wrong) the consequences don't seem as big as the reported case of growing > table size. I agree sites that are hurting now will see a net benefit. I can call it a bug that we treat just-extended pages differently from existing zero-line-pointer pages (e.g. pages left by RelationAddExtraBlocks()). Changing how we treat pages having 100 bytes of data feels different to me. It's more like a policy decision, not a clear bug fix. I'm open to back-patching, but I plan to do so only if a few people report being firmly in favor. > > Besides the usual choices of > > back-patching or not back-patching, we could back-patch with a stricter > > threshold. Suppose we accepted pages for larger-than-fillfactor tuples when > > the pages have at least > > BLCKSZ-SizeOfPageHeaderData-sizeof(ItemIdData)-MAXALIGN(MAXALIGN(SizeofHeapTupleHeader)+1)+1 > > bytes of free space. That wouldn't reuse a page containing a one-column > > tuple, but it would reuse a page having up to eight line pointers. > > I'm not sure how much that would help in the reported case that started this thread. I'm not sure, either. The thread email just before yours (27 Mar 2021 10:24:00 +0000) does suggest it would help less than the main proposal.