Re: Optimize external TOAST storage - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Optimize external TOAST storage |
Date | |
Msg-id | CA+TgmobM=vxNL35QwEeyUiRt1GKdtw1=tveE8Fo705h6j8_vYg@mail.gmail.com Whole thread Raw |
In response to | Re: Optimize external TOAST storage (Nathan Bossart <nathandbossart@gmail.com>) |
Responses |
Re: Optimize external TOAST storage
|
List | pgsql-hackers |
On Thu, Mar 17, 2022 at 4:05 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > On Thu, Mar 17, 2022 at 01:04:17PM -0400, Robert Haas wrote: > > Right, so perhaps the ultimate thing here would be a more fine-grained > > knob than SET STORAGE EXTERNAL -- something that allows you to specify > > that you want to compress only when it really helps. While some people > > might find that useful, I think the current patch is less ambitious, > > and I think that's OK. It just wants to save something in the cases > > where it's basically free. Unfortunately we've learned that it's never > > *entirely* free because making the last TOAST chunk longer can always > > cost you something, even if it gets longer by only 1 byte. But for > > larger values it's hard for that to be significant. > > I guess I think we should be slightly more ambitious. One idea could be to > create a default_toast_compression_ratio GUC with a default of 0.95. This > means that, by default, a compressed attribute must be 0.95x or less of the > size of the uncompressed attribute to be stored compressed. Like > default_toast_compression, this could also be overridden at the column > level with something like > > ALTER TABLE mytbl ALTER mycol SET COMPRESSION lz4 RATIO 0.9; > > If the current "basically free" patch is intended for v15, then maybe all > this extra configurability stuff could wait for a bit, especially if we can > demonstrate a decent read performance boost with a more conservative > setting. However, I don't see anything terribly complicated about the > proposed configurability changes (assuming my proposal makes some amount of > sense to you and others). We seem to have a shortage of "others" showing up with opinions on this topic, but I guess I'm not very confident about the general utility of such a setting. Just to be clear, I'm also not very confident about the usefulness of the existing settings for controlling TOAST. Why is it useful default behavior to try to get rows down to 2kB by default, rather than 1.8kB or 3kB? Even more, why don't we try to compress primarily based on the length of individual attributes and then compress further only if the resulting tuple doesn't fit into a page at all? There doesn't seem to be anything magic about fitting tuples into a quarter-page, yet the code acts as though that's the primary goal - and then, when that didn't turn out to work well in all cases, we added a user-settable parameter (toast_tuple_target) to let you say you really want tuples in table X to fit into a third of a page or a fifth of a page instead of a quarter. And there's some justification for that: a proposal to fundamentally change the algorithm would likely have gotten bogged down for years, and the parameter no doubt lets you solve some problems. Yet if the whole algorithm is wrong, and I think maybe it is, then being able to change the constants is not really getting us where we need to be. Then, too, I'm not very confident about the usefulness of EXTENDED, EXTERNAL, and MAIN. I think it's useful to be able to categorically disable compression (as EXTERNAL does), but you can't categorically disable out-of-line storage because the value can be bigger than the page, so MAIN vs. EXTENDED is just changing the threshold for the use of out-of-line storage. However, it does so in a way that IMHO is not particularly intuitive, which goes back to my earlier point about the algorithm seeming wacky, and it's in any case unclear why we should offer exactly two possible thresholds and not any of the intermediate values. I think I'm prepared to concede that the setting you are proposing here is easier to understand than either of those. Saying "store this value compressed only if you an thereby reduce the size by at least 10%" is an understandable directive which I think will make sense even to people who know very little about PostgreSQL or databases generally, as long as they have some vague idea how compression works. However, I guess I am not confident that the setting would get much use, or that it is the best way to measure what we care about. For instance, if my value is 19kB in length, saving 10% isn't very impactful, because it's still the same number of chunks, and the only benefit I gain is that the last chunk is smaller, which is probably not very impactful because I'm still going to have 8 full chunks. I will gain more if the value is bigger or smaller. Going up, if I increase the size to 19MB or 190MB, I'm really saving a very significant amount of cache space and, potentially, I/O. If I was happy with a 10% savings at 19kB, I will likely be happy with a considerably smaller economy here. Going down, if I decrease the size to 1.9kB, I'm not storing any full-sized chunks any more, so reducing the size of the one partial chunk that I'm storing seems like it could have a real impact, provided of course that other toasted values in the same table are similarly small. If the whole TOAST table is full of just partial chunks, then the smaller they are the more will fit into a page, all things being equal, and thus the smaller the TOAST table will be. Maybe the conclusion here is that more thought is needed before changing anything in this area.... -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: