Re: Disabling Heap-Only Tuples - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Disabling Heap-Only Tuples |
Date | |
Msg-id | CA+TgmobCwFL94gxKi6jofxmfpggjJEvwsnaH4wjKJWMCTC3axA@mail.gmail.com Whole thread Raw |
In response to | Re: Disabling Heap-Only Tuples (Laurenz Albe <laurenz.albe@cybertec.at>) |
Responses |
Re: Disabling Heap-Only Tuples
Re: Disabling Heap-Only Tuples |
List | pgsql-hackers |
On Tue, Sep 5, 2023 at 11:15 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > I don't think that is a good comparison. While most people probably > never need to touch "local_update_limit", "work_mem" is something everybody > has to consider. > > And it is not so hard to tune: the setting would be the desired table > size, and you could use pgstattuple to find a good value. What I suspect would happen, though, is that you'd end up tuning the value over and over. You'd set it to some value and after some number of vacuums maybe you'd realize that you could save even more disk space if you reduced it a bit further or maybe your data set would grow a bit and you'd have to increase it a little (or a lot). And if you didn't keep adjusting it then maybe something quite bad would happen to your database. work_mem isn't quite the same in the sense that most people don't need to keep on iteratively tuning work_mem, at least not in my experience. You figure out a value that works OK in practice and then leave it alone. The problem is mostly discovering what that initial value ought to be, which is often hard. But what is the same here and in the case of work_mem is that you can suddenly get hosed if the situation changes substantially and you don't respond by updating the parameter setting. In the case of work_mem, again in my experience, it's quite common for people to suddenly find themselves in a lot of trouble if they have a load spike, because now they're running a lot more copies of the same query and the machine runs out of memory. The equivalent problem here would be if the table suddenly gets a lot bigger due to a load spike or some change in the way the application is used. Then suddenly, a setting that was previously serving to keep the table pleasantly small and un-bloated on disk is instead causing tons of updates that would have been HOT to become non-HOT, which could very easily result in both the table and its indexes bloating quite rapidly. I really don't like the idea of an anti-bloat feature that, when set to the wrong value, becomes a bloat-amplification feature. I don't know how to describe that other than "fragile and dangerous." Imagine a hypothetical feature that knew how small the table could reasonably be kept, say by magic, and did non-HOT updates instead of HOT updates whenever doing so would allow moving a tuple from a page beyond that magical boundary to an earlier page. Such a feature would not have the downsides that this one does -- if there were opportunities to make the table smaller, the system would take advantage of them automatically, and if the table grew, the system would automatically become more relaxed to stay out of trouble. Such a feature is clearly more work to design and implement than what is proposed here, but it would also work a lot better in practice. In fact, I daresay that if we accept the feature as proposed, somebody's going to go out and write a tool to calculate what the threshold ought to be and automatically adjust it as things change. Users of the tool will then divide into two camps: - People who try to tune it manually and get burned if anything changes on their system. - People who use that out-of-core tool. So the out-of-core tool that does this tuning becomes a stealth dependency for any user who is facing this problem. Gosh, don't we have enough of those already? Connection pooling being perhaps the most obvious example, but far from the only one. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: