Re: Disabling Heap-Only Tuples - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: Disabling Heap-Only Tuples |
Date | |
Msg-id | CAFiTN-t_DgOwywTEQr60fBihNDsqYyLYe5CEE67dZjsS978Asw@mail.gmail.com Whole thread Raw |
In response to | Re: Disabling Heap-Only Tuples (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: Disabling Heap-Only Tuples
|
List | pgsql-hackers |
On Fri, Jul 7, 2023 at 3:48 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > On 7/7/23 11:55, Matthias van de Meent wrote: > > On Fri, 7 Jul 2023 at 06:53, Dilip Kumar <dilipbalaut@gmail.com> wrote: > >> > >> On Fri, Jul 7, 2023 at 1:48 AM Matthias van de Meent > >> <boekewurm+postgres@gmail.com> wrote: > >>> > >>> On Wed, 5 Jul 2023 at 19:55, Thom Brown <thom@linux.com> wrote: > >>>> > >>>> On Wed, 5 Jul 2023 at 18:05, Matthias van de Meent > >>>> <boekewurm+postgres@gmail.com> wrote: > >>>>> So what were you thinking of? A session GUC? A table option? > >>>> > >>>> Both. > >>> > >>> Here's a small patch implementing a new table option max_local_update > >>> (name very much bikesheddable). Value is -1 (default, disabled) or the > >>> size of the table in MiB that you still want to allow to update on the > >>> same page. I didn't yet go for a GUC as I think that has too little > >>> control on the impact on the system. > >> > >> So IIUC, this parameter we can control that instead of putting the new > >> version of the tuple on the same page, it should choose using > >> RelationGetBufferForTuple(), and that can reduce the fragmentation > >> because now if there is space then most of the updated tuple will be > >> inserted in same pages. But this still can not truncate the pages > >> from the heap right? because we can not guarantee that the new page > >> selected by RelationGetBufferForTuple() is not from the end of the > >> heap, and until we free the pages from the end of the heap, the vacuum > >> can not truncate any page. Is my understanding correct? > > > > Yes. If you don't have pages with (enough) free space for the updated > > tuples in your table, or if the FSM doesn't accurately reflect the > > actual state of free space in your table, this won't help (which is > > also the reason why I run vacuum in the tests). It also won't help if > > you don't update the tuples physically located at the end of your > > table, but in the targeted workload this would introduce a bias where > > new tuple versions are moved to the front of the table. > > > > Something to note is that this may result in very bad bloat when this > > is combined with a low fillfactor: All blocks past max_local_update > > will be unable to use space reserved by fillfactor because FSM lookups > > always take fillfactor into account, and all updates (which ignore > > fillfactor when local) would go through the FSM instead, thus reducing > > the space available on each block to exactly the fillfactor. So, this > > might need some extra code to make sure we don't accidentally blow up > > the table's size with UPDATEs when max_local_update is combined with > > low fillfactors. I'm not sure where that would fit best. > > > > I know the thread started as "let's disable HOT" and this essentially > just proposes to do that using a table option. But I wonder if that's > far too simple to be reliable, because hoping RelationGetBufferForTuple > happens to do the right thing does not seem great. > > I wonder if we should invent some definition of "strategy" that would > tell RelationGetBufferForTuple what it should aim for ... > > I'm imagining either a table option with a couple possible values > (default, non-hot, first-page, ...) or maybe something even more > elaborate (perhaps even a callback?). > > Now, it's not my intention to hijack this thread, but this discussion > reminds me one of the ideas from my "BRIN improvements" talk, about > maybe using BRIN indexes for routing. UPDATEs may be a major issue for > BRIN, making them gradually worse over time. If we could "tell" > RelationGetBufferForTuple() which buffers are more suitable (by looking > at an index, histogram or some approximate mapping), that might help. IMHO that seems like the right direction for this feature to be useful. Otherwise just forcing it to select a page using RelationGetBufferForTuple() without any input or direction to this function can behave pretty randomly. In fact, there should be some way to say insert a new tuple in a smaller block number first (provided they have free space) and with that, we might get an opportunity to truncate some heap pages by vacuum. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: