Re: Disabling Heap-Only Tuples - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Disabling Heap-Only Tuples |
Date | |
Msg-id | 20230921231852.lfaueewbrv5lz6nr@awork3.anarazel.de Whole thread Raw |
In response to | Re: Disabling Heap-Only Tuples (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
Responses |
Re: Disabling Heap-Only Tuples
|
List | pgsql-hackers |
Hi, On 2023-09-19 20:20:06 +0200, Matthias van de Meent wrote: > Mostly agreed, but I think there's a pitfall here. You seem to assume > we have a perfect oracle that knows the optimal data size, but we > already know that our estimates can be significantly off. I don't > quite trust the statistics enough to do any calculations based on the > number of tuples in the relation. That also ignores the fact that we > don't actually have any good information about the average size of the > tuples in the table. So with current statistics, any automated "this > is how large the table should be" decisions would result in an > automated footgun, instead of the current patch's where the user has > to decide to configure it to an explicit value. The proposed patch already relies on the FSM being reasonably up2date, no? If the FSM doesn't know about free space, the patch won't be able to place tuples earlier in the relation. And if the FSM wrongly thinks there's lots of free space, it'll make updates very expensive. We obviously don't want to scan the whole FSM on an ongoing basis, but visiting the top-level FSM pages and/or having vacuum/analyze update some statistic based on a more thorough analysis of the FSM doesn't seem insane. A related issue is that an accurate tuple size and accurate number of tuples isn't really sufficient - if tuples are wider, there can be plenty space on pages without updates being able to reuse that space. And the width of tuples doesn't have to be evenly distributed, so a simple approach of calculating how many tuples of the average width fit in a page and then using that to come up with the overall number of required pages isn't necessarily accurate either. > But about that: I'm not sure what the "footgun" is that you've > mentioned recently? > The issue with excessive bloat (when the local_update_limit is set too > small and fillfactor is low) was fixed in the latest patch nearly > three weeks ago, so the only remaining issue with misconfiguration is > slower updates. There seem to be plenty footguns. Just to name a few: - The user has to determine a good value for local_update_limit, without really any good way of doing so. - A "too low" local_update_limit will often succeed in finding some space in earlier pages, without that providing useful progress on compaction - e.g. because subsequently tuples on the earlier page will be updated and there's now no space anymore. Leading to index bloat. - Configuring local_update_limit as a fixed size will be fragile when the data actually grows, leading to lots of pointless out-of-page updates. I think a minimal working approach could be to have the configuration be based on the relation size vs space known to the FSM. If the target block of an update is higher than ((relation_size - fsm_free_space) * new_reloption_or_guc), try finding the target block via the FSM, even if there's space on the page. > Sure, that's not great, but in my opinion not a > "footgun": performance returns immediately after resetting > local_update_limit, and no space was lost. I think there's plenty ways to get pointless out-of-page updates, and therefore index bloat, with local_update_limit as-proposed (see earlier in the email). Once you have such pointless out-of-page updates, disabling local_update_limit won't bring performance back immediately (space usage due to index bloat and lookup performance issues due to the additional index entries). > Updating the reloption after relation truncation implies having the > same lock as relation truncation, i.e. AEL (if the vacuum docs are to > be believed). Aside: We really need to get rid of the AEL for relation trunction - it's quite painful for hot standby workloads... Thomas has been talking about a patch (and perhaps even posted it) that adds infrastructure providing a "shared smgrrelation". Once we have that I think we could lower the required lock level for truncation, by having storing both the filesystem size and the "valid" size. There's a few potential models: - Vacuum truncation could lower the valid size in-memory, end its transaction, wait for concurrent accesses to the relation to finish, check if/where to the relation has been extended since, acquire the extension lock and truncate down to the "valid" size. The danger with that is that the necessary waiting can be long, threatening to starve autovacuum of workers. - Instead of making a single vacuum wait, we could have one vacuum update the valid size of the relation and also store an xid horizon. Later vacuums can truncate the physical size down the to valid size if there are no snapshot conflicts with said xid anymore. If we had such an shared smgrrel, we could also make relation extension a lot more efficient, because we would not need to pin all pages that a relation extension "covers" - the reason that we need to pin the to-be-extended-pages is to prevent concurrent scans from reading "new" blocks while the extension is in progress, as otherwise such a buffer can be dirtied and written out, potentially leading to lost writes and other fun issues. But with the shared smgrrel, we can store the size-currently-being-extended-to separately from the filesystem size. If it's not allowed to read the block range covered by those blocks into s_b, the race doesn't exist anymore. Greetings, Andres Freund
pgsql-hackers by date: