Thread: Optimize external TOAST storage
Hi,
For Toast storage [1] in PostgreSQL, first, the attribute value is compressed
and then divided into chunks. The problem with storing compressed value is, if we
are not saving enough space such that it reduces the #chunks then we end up
adding extra decompression cost on every read.
Based on the discussion with Robert and Dilip, we tried to optimize this
process. The idea is to TOAST the compressed value only if we are saving at least
and then divided into chunks. The problem with storing compressed value is, if we
are not saving enough space such that it reduces the #chunks then we end up
adding extra decompression cost on every read.
Based on the discussion with Robert and Dilip, we tried to optimize this
process. The idea is to TOAST the compressed value only if we are saving at least
1 chunk(2KB default) of disk storage else use the uncompressed one. In this way,
we will save decompression costs without losing much on storage.
In our tests, we have observed improvement in the read performance up to 28% by
giving up at most TOAST_MAX_CHUNK_SIZE (2KB) bytes for storage on disk. The
gain is more on large attributes (size > 4KB) because compression/decompression
cost increases with size.
Create table t1_lz4 ( a text compression lz4, b text compression lz4);
However, We have found, this assumption is not true when the data compression
ratio is more and the size is barely big enough to cause TOASTing. For example,
in the following test 4. b, we didn't get any performance advantage but the table
size grew by 42% by storing uncompressed values.
Test Setup.Create table t1_lz4 ( a text compression lz4, b text compression lz4);
-- Generate random data
create or replace function generate_att_data(len_info int)
returns text
language plpgsql
as
$$
declare
value text;
begin
select array_agg(md5(g::text))
into value
from generate_series(1, round(len_info/33)::int) g;
return value;
end;
$$;
--Test
Select b from t1_lz4;
create or replace function generate_att_data(len_info int)
returns text
language plpgsql
as
$$
declare
value text;
begin
select array_agg(md5(g::text))
into value
from generate_series(1, round(len_info/33)::int) g;
return value;
end;
$$;
--Test
Select b from t1_lz4;
Test 1:
Data: rows 200000
insert into t1_lz4(a, b) select generate_att_data(364), repeat (generate_att_data(1980), 2);
Summary:
Attribute size: original: 7925 bytes, after compression: 7845 bytes
Time for select: head: 42 sec, patch: 37 sec, Performance Gain: 11%
table size: Head 1662 MB, Patch: 1662 MB
Test 2:
Data: rows 100000
insert into t1_lz4(a, b) select generate_att_data(364), generate_att_data(16505);
Summary:
Attribute size: original: 16505 bytes, after compression: 16050 bytes
Time for select: head: 35.6 sec, patch: 30 sec, Performance Gain: 14%
table size: Head 1636 MB, Patch: 1688 MB
Test 3:
Data: rows 50000
insert into t1_lz4(a, b) select generate_att_data(364), generate_att_data(31685);
Summary:
Attribute size: original: 31685 bytes, after compression: 30263 bytes
Time for select: head: 35.4 sec, patch: 25.5 sec, Performance Gain: 28%
table size: Head 1601 MB, Patch: 1601 MB
Data: rows 200000
insert into t1_lz4(a, b) select generate_att_data(364), repeat (generate_att_data(1980), 2);
Summary:
Attribute size: original: 7925 bytes, after compression: 7845 bytes
Time for select: head: 42 sec, patch: 37 sec, Performance Gain: 11%
table size: Head 1662 MB, Patch: 1662 MB
Test 2:
Data: rows 100000
insert into t1_lz4(a, b) select generate_att_data(364), generate_att_data(16505);
Summary:
Attribute size: original: 16505 bytes, after compression: 16050 bytes
Time for select: head: 35.6 sec, patch: 30 sec, Performance Gain: 14%
table size: Head 1636 MB, Patch: 1688 MB
Test 3:
Data: rows 50000
insert into t1_lz4(a, b) select generate_att_data(364), generate_att_data(31685);
Summary:
Attribute size: original: 31685 bytes, after compression: 30263 bytes
Time for select: head: 35.4 sec, patch: 25.5 sec, Performance Gain: 28%
table size: Head 1601 MB, Patch: 1601 MB
Test 4.a:
Data: rows 200000
insert into t1_lz4(a, b) select generate_att_data(11), repeat ('b', 250) || generate_att_data(3885);
Summary:
Attribute size: original: 3885 bytes, after compression: 3645 bytes
Time for select: head: 28 sec, patch: 26 sec, Performance Gain: 7%
table size: Head 872 MB, Patch: 872 MB
Data: rows 200000
insert into t1_lz4(a, b) select generate_att_data(11), repeat ('b', 250) || generate_att_data(3885);
Summary:
Attribute size: original: 3885 bytes, after compression: 3645 bytes
Time for select: head: 28 sec, patch: 26 sec, Performance Gain: 7%
table size: Head 872 MB, Patch: 872 MB
Test 4.b (High compression):
Data: rows 200000
insert into t1_lz4(a, b) select generate_att_data(364), repeat (generate_att_data(1980), 2);
Summary:
Attribute size: original: 3966 bytes, after compression: 2012 bytes
Time for select: head: 27 sec, patch: 26 sec, Performance Gain: 0%
table size: Head 612 MB, Patch: 872 MB
This is the worst case for this optimization because of the following 2 reasons.
First, the table size would grow by 50% when compressed size is half of the original size and
yet barely large enough for TOASTing. Table size can't grow more than that because If
compression reduces the size even more then it will reduce the #chunks as well and it stores
the compressed value in the table.
Second, not much gain in performance because of the small attribute size, more attributes fit
in page (almost twice), on each page access it can access twice the number of rows. And
also small value means low compression/decompression costs.
We have avoided such cases by applying the optimization when attribute size > 4KB.
Regards,
Davinder
Attachment
On Mon, Feb 28, 2022 at 3:22 PM davinder singh <davindersingh2692@gmail.com> wrote: > > Hi, > > For Toast storage [1] in PostgreSQL, first, the attribute value is compressed > and then divided into chunks. The problem with storing compressed value is, if we > are not saving enough space such that it reduces the #chunks then we end up > adding extra decompression cost on every read. > Based on the discussion with Robert and Dilip, we tried to optimize this > process. The idea is to TOAST the compressed value only if we are saving at least > 1 chunk(2KB default) of disk storage else use the uncompressed one. In this way, > we will save decompression costs without losing much on storage. > > In our tests, we have observed improvement in the read performance up to 28% by > giving up at most TOAST_MAX_CHUNK_SIZE (2KB) bytes for storage on disk. The > gain is more on large attributes (size > 4KB) because compression/decompression > cost increases with size. > However, We have found, this assumption is not true when the data compression > ratio is more and the size is barely big enough to cause TOASTing. For example, > in the following test 4. b, we didn't get any performance advantage but the table > size grew by 42% by storing uncompressed values. > Test Setup. > > Create table t1_lz4 ( a text compression lz4, b text compression lz4); > -- Generate random data > create or replace function generate_att_data(len_info int) > returns text > language plpgsql > as > $$ > declare > value text; > begin > select array_agg(md5(g::text)) > into value > from generate_series(1, round(len_info/33)::int) g; > return value; > end; > $$; > > --Test > Select b from t1_lz4; > > Test 1: > Data: rows 200000 > insert into t1_lz4(a, b) select generate_att_data(364), repeat (generate_att_data(1980), 2); > Summary: > Attribute size: original: 7925 bytes, after compression: 7845 bytes > Time for select: head: 42 sec, patch: 37 sec, Performance Gain: 11% > table size: Head 1662 MB, Patch: 1662 MB > > Test 2: > Data: rows 100000 > insert into t1_lz4(a, b) select generate_att_data(364), generate_att_data(16505); > Summary: > Attribute size: original: 16505 bytes, after compression: 16050 bytes > Time for select: head: 35.6 sec, patch: 30 sec, Performance Gain: 14% > table size: Head 1636 MB, Patch: 1688 MB > > Test 3: > Data: rows 50000 > insert into t1_lz4(a, b) select generate_att_data(364), generate_att_data(31685); > Summary: > Attribute size: original: 31685 bytes, after compression: 30263 bytes > Time for select: head: 35.4 sec, patch: 25.5 sec, Performance Gain: 28% > table size: Head 1601 MB, Patch: 1601 MB > > Test 4.a: > Data: rows 200000 > insert into t1_lz4(a, b) select generate_att_data(11), repeat ('b', 250) || generate_att_data(3885); > Summary: > Attribute size: original: 3885 bytes, after compression: 3645 bytes > Time for select: head: 28 sec, patch: 26 sec, Performance Gain: 7% > table size: Head 872 MB, Patch: 872 MB > > Test 4.b (High compression): > Data: rows 200000 > insert into t1_lz4(a, b) select generate_att_data(364), repeat (generate_att_data(1980), 2); > Summary: > Attribute size: original: 3966 bytes, after compression: 2012 bytes > Time for select: head: 27 sec, patch: 26 sec, Performance Gain: 0% > table size: Head 612 MB, Patch: 872 MB I think the idea looks interesting and the results are also promising. I have a few initial comments, later I will do more detailed review. 1. + if (*value == orig_toast_value) + toast_tuple_externalize(ttc, attribute, options); Isn't it looks cleaner to check whther the attribute is compressed or not like this VARATT_IS_COMPRESSED(DatumGetPointer(*value)) ? 2. + /* Using the uncompressed data instead, deleting compressed data. */ + pfree(DatumGetPointer(*value)); change as below /deleting compressed data/release memory for the compressed data 3. + memcpy(toast_attr_copy, toast_attr, sizeof(toast_attr)); + memcpy(toast_values_copy, toast_values, sizeof(toast_values)); + Add some comment here, what we are trying to copy and why? 4. + /* incompressible, ignore on subsequent compression passes. */ + orig_attr->tai_colflags |= TOASTCOL_INCOMPRESSIBLE; Do we need to set TOASTCOL_INCOMPRESSIBLE while trying to externalize it, the comment say "ignore on subsequent compression passes" but after this will there be more compression passes? If we need to set this TOASTCOL_INCOMPRESSIBLE then comment should explain this. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Thanks Dilip, I have fixed your comments, please find the updated patch.
--
On Tue, Mar 8, 2022 at 9:44 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:.
+ /* incompressible, ignore on subsequent compression passes. */
+ orig_attr->tai_colflags |= TOASTCOL_INCOMPRESSIBLE;
Do we need to set TOASTCOL_INCOMPRESSIBLE while trying to externalize
it, the comment say "ignore on subsequent compression passes"
but after this will there be more compression passes? If we need to
set this TOASTCOL_INCOMPRESSIBLE then comment should explain this.
That was a mistake, this flag is not required at this point, as the attribute is externalized it will be marked as TOASTCOL_IGNORE, and such columns are not considered for compression, I removed it. Thanks for pointing it out.
Regards,
Davinder
EnterpriseDB: http://www.enterprisedb.com
Attachment
On Thu, Mar 10, 2022 at 2:04 PM davinder singh <davindersingh2692@gmail.com> wrote: > > Thanks Dilip, I have fixed your comments, please find the updated patch. > Some more comments + /* + * For those cases where storing compressed data is not optimal, We will use + * this pointer copy for referring uncompressed data. + */ + memcpy(toast_attr_copy, toast_attr, sizeof(toast_attr)); + memcpy(toast_values_copy, toast_values, sizeof(toast_values)); I think we can change the comments like below "Preserve references to the original uncompressed data before attempting the compression. So that during externalize if we decide not to store the compressed data then we have the original data with us. For more details refer to comments atop toast_tuple_opt_externalize". +/* + * Don't save compressed data to external storage unless it saves I/O while + * accessing the same data by reducing the number of chunks. + */ +void +toast_tuple_opt_externalize(ToastTupleContext *ttc, int attribute, int options, + Datum orig_toast_value, ToastAttrInfo *orig_attr) I think these comments are not explaining what is the problem with storing the compressed data. So you need to expand this comment saying if it is not reducing the number or chunks then there is no point in storing the compressed data because then we will have additional decompression cost whenever we are fetching that data. + + /* Sanity check: if data is not compressed then we can proceed as usual. */ + if (!VARATT_IS_COMPRESSED(DatumGetPointer(*value))) + toast_tuple_externalize(ttc, attribute, options); I think we don't need "Sanity check:" here, the remaining part of the comment is fine. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
I think this is an interesting idea. My first thought is that it would be nice if we didn't have to first compress the data to see whether we wanted to store it compressed or not, but I don't think there is much we can do about that. In any case, we aren't adding much work in the write path compared to the potential savings in the read path, so that is probably okay. Do you think it is worth making this configurable? I don't think it is outside the realm of possibility for some users to care more about disk space than read performance. And maybe the threshold for this optimization could differ based on the workload. extern void toast_tuple_externalize(ToastTupleContext *ttc, int attribute, int options); +extern void toast_tuple_opt_externalize(ToastTupleContext *ttc, int attribute, + int options, Datum old_toast_value, + ToastAttrInfo *old_toast_attr); Could we bake this into toast_tuple_externalize() so that all existing callers benefit from this optimization? Is there a reason to export both functions? Perhaps toast_tuple_externalize() should be renamed and made static, and then this new function could be called toast_tuple_externalize() (which would be a wrapper around the internal function). + /* Sanity check: if data is not compressed then we can proceed as usual. */ + if (!VARATT_IS_COMPRESSED(DatumGetPointer(*value))) + toast_tuple_externalize(ttc, attribute, options); With a --enable-cassert build, this line causes assertion failures in the call to GetMemoryChunkContext() from pfree(). 'make check' is enough to reproduce it. Specifically, it fails the following assertion: AssertArg(MemoryContextIsValid(context)); I didn't see an existing commitfest entry for this patch. I'd encourage you to add one in the July commitfest: https://commitfest.postgresql.org/38/ -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Sun, Mar 13, 2022 at 3:43 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
Do you think it is worth making this configurable? I don't think it is
outside the realm of possibility for some users to care more about disk
space than read performance. And maybe the threshold for this optimization
could differ based on the workload.
I think here we can break it into two parts.
The first part if the user cares about the disk more than reading performance, disable this?
That is a good idea, we can try this, lets see what others say.
Regarding the 2nd part of configuring the threshold, Based on our
experiments, we have fixed it for the attributes with size > 2 * chunk_size.
The default chunk_size is 2KB and the page size is 8KB. While toasting each
attribute is divided into chunks, and each page can hold a max of 4 such chunks.
We only need to think about the space used by the last chunk of the attribute.
This means with each value optimization, it might use extra space in the range
(0B,2KB]. I think this extra space is independent of attribute size. So we don't
need to worry about configuring this threshold. Let me know if I missed something
here.
+ if (!VARATT_IS_COMPRESSED(DatumGetPointer(*value)))
extern void toast_tuple_externalize(ToastTupleContext *ttc, int attribute,
int options);
+extern void toast_tuple_opt_externalize(ToastTupleContext *ttc, int attribute,
+ int options, Datum old_toast_value,
+ ToastAttrInfo *old_toast_attr);
Could we bake this into toast_tuple_externalize() so that all existing
callers benefit from this optimization? Is there a reason to export both
functions? Perhaps toast_tuple_externalize() should be renamed and made
static, and then this new function could be called
toast_tuple_externalize() (which would be a wrapper around the internal
function).
This function is used only in heap_toast_insert_or_update(), all existing callers are
using new function only. As you suggested, I have renamed the new function as
wrapper function and also only exporting the new function.
+ /* Sanity check: if data is not compressed then we can proceed as usual. */
+ if (!VARATT_IS_COMPRESSED(DatumGetPointer(*value)))
+ toast_tuple_externalize(ttc, attribute, options);
With a --enable-cassert build, this line causes assertion failures in the
call to GetMemoryChunkContext() from pfree(). 'make check' is enough to
reproduce it. Specifically, it fails the following assertion:
AssertArg(MemoryContextIsValid(context));
Thanks for pointing it out, this failure started because I was not handling the
case when the data is already compressed even before toasting. Following
check verifies if the data is compressed or not but that is not enough because
we can't optimize the toasting if we didn't get the data in the
uncompressed form in the first place from the source.
+ /* If data is not compressed then we can proceed as usual. */+ if (!VARATT_IS_COMPRESSED(DatumGetPointer(*value)))
v1 patch didn't have this problem because it was verifying whether we have
compressed data in this toasting round or not. I have changed back to the
earlier version.
+ /* If data is not compressed then we can proceed as usual. */
+ if (*value == orig_toast_value)
Regards,
Davinder
Attachment
On Wed, Mar 16, 2022 at 01:03:38AM +0530, davinder singh wrote: > Regarding the 2nd part of configuring the threshold, Based on our > experiments, we have fixed it for the attributes with size > 2 * chunk_size. > The default chunk_size is 2KB and the page size is 8KB. While toasting each > attribute is divided into chunks, and each page can hold a max of 4 such > chunks. > We only need to think about the space used by the last chunk of the > attribute. > This means with each value optimization, it might use extra space in the > range > (0B,2KB]. I think this extra space is independent of attribute size. So we > don't > need to worry about configuring this threshold. Let me know if I missed > something > here. Looking closer, ISTM there are two thresholds here. The first threshold is the minimum size of the uncompressed data required to trigger this optimization. The purpose behind this threshold is to avoid increasing disk space usage in return for very little gain in the read path (i.e., skipping decompression). Specifically, this threshold helps avoid increasing disk space usage when there are many small TOAST attributes. You've chosen 2x the TOAST chunk size as the threshold. The second threshold is the compression ratio required to trigger this optimization. The purpose behind this threshold is to avoid storing poorly compressed data so that we can improve performance in the read path (i.e., skipping decompression). You've chosen >=1 TOAST chunk as the threshold. This means that when I have an attribute with size 3x a TOAST chunk, I'll only store it compressed when the compressed attribute is 2x a TOAST chunk or smaller (a 33% improvement). If I have an attribute with size 100x a TOAST chunk, I'll store it compressed when the compressed attribute is 99x a TOAST chunk or smaller (a 1% improvement). IIUC this means that it is much more likely that you will store a compressed attribute when the uncompressed data is larger. The benefit of this approach is that you know you'll never give up more than one TOAST chunk of disk space in the name of faster reads. The downside is that a huge number of smaller attributes can require much more disk space than if they were universally compressed. Perhaps this is missing one additional threshold. Perhaps there should be a compression ratio for which the optimization is ignored. If my uncompressed data is 3x a TOAST chunk and compressing it would save half of a TOAST chunk of disk space (a ~17% improvement), maybe we should still store it compressed. I think this threshold would be most important for servers with many small TOAST attributes. I don't know whether it is necessary to make all of this configurable or whether we should just use some heuristics to find a good balance. In any case, I imagine the settings should look different for the different TOAST compression methods. Or maybe we should just make the first threshold the configuration option for this feature. If you set it to -1, the optimization is never used. If you set it to 0, you always try to save on read time at the expense of disk space. If you set it to 2x the TOAST chunk size, you might use up to 50% more disk space for read path optimizations. If you set it to 100x the TOAST chunk size, you might use up to 1% more disk space for read optimizations. By default, this configuration option could be set conservatively (e.g., 10x a TOAST chunk, which could lead to a maximum of 10% more disk space). I suspect the increase in disk space would be much less than these numbers in most cases, but it might be helpful to think of these things in terms of the worst case scenario. I apologize for thinking out loud a bit here, but I hope this gives you some insight into my perspective on this. In general, I am skeptical that we can choose one threshold that will work for all PostgreSQL installations in the known universe. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Tue, Mar 15, 2022 at 5:48 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > I apologize for thinking out loud a bit here, but I hope this gives you > some insight into my perspective on this. In general, I am skeptical that > we can choose one threshold that will work for all PostgreSQL installations > in the known universe. I would like to take a slightly contrary position. I think that a system here that involves multiple knobs is going to be too complicated to be of any real-world use, because almost nobody will understand it or tune it properly for their installation. And who is to say that a single setting would even be appropriate for a whole installation, as opposed to something that needs to be tuned for each individual table? A single tunable might be OK, but what I would really like here is a heuristic that, while perhaps not optimal in every environment, is good enough that a very high percentage of users will never need to worry about it. In a case like this, a small gain that happens automatically feels better than a large gain that requires manual tweaking in every install. Now that doesn't answer the question of what that heuristic should be. I initially thought that if compression didn't end up reducing the number of TOAST chunks then there was no point, but that's not true, because having the last chunk be smaller than the rest can allow us to fit more than 4 into a page rather than exactly 4. However, this effect isn't likely to be important if most chunks are full-size chunks. If we insert 100 full-size chunks and 1 partial chunk at the end, we can't save much even if that chunk ends up being 1 byte instead of a full-size chunk. 25 TOAST table pages out of 26 are going to be full of full-size chunks either way, so we can't save more than ~4% and we might easily save nothing at all. As you say, the potential savings are larger as the values get smaller, because a higher proportion of the TOAST table pages are not quite full. So I think the only cases where it's even interesting to think about suppressing this optimization are those where the value is quite small -- and maybe the thing to do is just pick a conservatively large threshold and call it good. For example, say that instead of applying this technique when there are at least 2 TOAST chunks, we apply it when there are at least 500 (i.e. normally 1MB). It's hard for me to imagine that we can ever lose, and in fact I think we could come down quite a bit from there and still end up winning pretty much all the time. Setting the threshold to, say, 50 or 100 or 150 TOAST chunks instead of 2 may leave some money on the table, but if it means that we don't have meaningful downsides and we don't have tunables for users to fiddle with, it might be worth it. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Mar 16, 2022 at 10:08:45AM -0400, Robert Haas wrote: > I would like to take a slightly contrary position. I think that a > system here that involves multiple knobs is going to be too > complicated to be of any real-world use, because almost nobody will > understand it or tune it properly for their installation. And who is > to say that a single setting would even be appropriate for a whole > installation, as opposed to something that needs to be tuned for each > individual table? A single tunable might be OK, but what I would > really like here is a heuristic that, while perhaps not optimal in > every environment, is good enough that a very high percentage of users > will never need to worry about it. In a case like this, a small gain > that happens automatically feels better than a large gain that > requires manual tweaking in every install. Agreed. If there is a tunable, it needs to be as simple as possible. And ideally most users would never need to use it because the default would be "good enough." > Now that doesn't answer the question of what that heuristic should be. > I initially thought that if compression didn't end up reducing the > number of TOAST chunks then there was no point, but that's not true, > because having the last chunk be smaller than the rest can allow us to > fit more than 4 into a page rather than exactly 4. However, this > effect isn't likely to be important if most chunks are full-size > chunks. If we insert 100 full-size chunks and 1 partial chunk at the > end, we can't save much even if that chunk ends up being 1 byte > instead of a full-size chunk. 25 TOAST table pages out of 26 are going > to be full of full-size chunks either way, so we can't save more than > ~4% and we might easily save nothing at all. As you say, the potential > savings are larger as the values get smaller, because a higher > proportion of the TOAST table pages are not quite full. So I think the > only cases where it's even interesting to think about suppressing this > optimization are those where the value is quite small -- and maybe the > thing to do is just pick a conservatively large threshold and call it > good. > > For example, say that instead of applying this technique when there > are at least 2 TOAST chunks, we apply it when there are at least 500 > (i.e. normally 1MB). It's hard for me to imagine that we can ever > lose, and in fact I think we could come down quite a bit from there > and still end up winning pretty much all the time. Setting the > threshold to, say, 50 or 100 or 150 TOAST chunks instead of 2 may > leave some money on the table, but if it means that we don't have > meaningful downsides and we don't have tunables for users to fiddle > with, it might be worth it. As long as we can demonstrate some decent improvements, I think using a conservative threshold is a good idea. I do wonder whether thiѕ could weaken the read performance gains quite a bit, though. Thinking further, is simply reducing the number of TOAST chunks the right thing to look at? If I want to add a TOAST attribute that requires 100,000 chunks, and you told me that I could save 10% in the read path for an extra 250 chunks of disk space, I would probably choose read performance. If I wanted to add 100,000 attributes that were each 3 chunks, and you told me that I could save 10% in the read path for an extra 75,000 chunks of disk space, I might choose the extra disk space. These are admittedly extreme (and maybe even impossible) examples, but my point is that the amount of disk space you are willing to give up may be related to the size of the attribute. And maybe one way to extract additional read performance with this optimization is to use a variable threshold so that we are more likely to use it for large attributes. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Wed, Mar 16, 2022 at 11:36:56AM -0700, Nathan Bossart wrote: > Thinking further, is simply reducing the number of TOAST chunks the right > thing to look at? If I want to add a TOAST attribute that requires 100,000 > chunks, and you told me that I could save 10% in the read path for an extra > 250 chunks of disk space, I would probably choose read performance. If I > wanted to add 100,000 attributes that were each 3 chunks, and you told me > that I could save 10% in the read path for an extra 75,000 chunks of disk > space, I might choose the extra disk space. These are admittedly extreme > (and maybe even impossible) examples, but my point is that the amount of > disk space you are willing to give up may be related to the size of the > attribute. And maybe one way to extract additional read performance with > this optimization is to use a variable threshold so that we are more likely > to use it for large attributes. I might be overthinking this. Maybe it is enough to skip compressing the attribute whenever compression saves no more than some percentage of the uncompressed attribute size. A conservative default setting might be something like 5% or 10%. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Wed, Mar 16, 2022 at 2:36 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > Thinking further, is simply reducing the number of TOAST chunks the right > thing to look at? If I want to add a TOAST attribute that requires 100,000 > chunks, and you told me that I could save 10% in the read path for an extra > 250 chunks of disk space, I would probably choose read performance. If I > wanted to add 100,000 attributes that were each 3 chunks, and you told me > that I could save 10% in the read path for an extra 75,000 chunks of disk > space, I might choose the extra disk space. These are admittedly extreme > (and maybe even impossible) examples, but my point is that the amount of > disk space you are willing to give up may be related to the size of the > attribute. And maybe one way to extract additional read performance with > this optimization is to use a variable threshold so that we are more likely > to use it for large attributes. Right, so perhaps the ultimate thing here would be a more fine-grained knob than SET STORAGE EXTERNAL -- something that allows you to specify that you want to compress only when it really helps. While some people might find that useful, I think the current patch is less ambitious, and I think that's OK. It just wants to save something in the cases where it's basically free. Unfortunately we've learned that it's never *entirely* free because making the last TOAST chunk longer can always cost you something, even if it gets longer by only 1 byte. But for larger values it's hard for that to be significant. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Mar 17, 2022 at 01:04:17PM -0400, Robert Haas wrote: > Right, so perhaps the ultimate thing here would be a more fine-grained > knob than SET STORAGE EXTERNAL -- something that allows you to specify > that you want to compress only when it really helps. While some people > might find that useful, I think the current patch is less ambitious, > and I think that's OK. It just wants to save something in the cases > where it's basically free. Unfortunately we've learned that it's never > *entirely* free because making the last TOAST chunk longer can always > cost you something, even if it gets longer by only 1 byte. But for > larger values it's hard for that to be significant. I guess I think we should be slightly more ambitious. One idea could be to create a default_toast_compression_ratio GUC with a default of 0.95. This means that, by default, a compressed attribute must be 0.95x or less of the size of the uncompressed attribute to be stored compressed. Like default_toast_compression, this could also be overridden at the column level with something like ALTER TABLE mytbl ALTER mycol SET COMPRESSION lz4 RATIO 0.9; If the current "basically free" patch is intended for v15, then maybe all this extra configurability stuff could wait for a bit, especially if we can demonstrate a decent read performance boost with a more conservative setting. However, I don't see anything terribly complicated about the proposed configurability changes (assuming my proposal makes some amount of sense to you and others). -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Thu, Mar 17, 2022 at 4:05 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > On Thu, Mar 17, 2022 at 01:04:17PM -0400, Robert Haas wrote: > > Right, so perhaps the ultimate thing here would be a more fine-grained > > knob than SET STORAGE EXTERNAL -- something that allows you to specify > > that you want to compress only when it really helps. While some people > > might find that useful, I think the current patch is less ambitious, > > and I think that's OK. It just wants to save something in the cases > > where it's basically free. Unfortunately we've learned that it's never > > *entirely* free because making the last TOAST chunk longer can always > > cost you something, even if it gets longer by only 1 byte. But for > > larger values it's hard for that to be significant. > > I guess I think we should be slightly more ambitious. One idea could be to > create a default_toast_compression_ratio GUC with a default of 0.95. This > means that, by default, a compressed attribute must be 0.95x or less of the > size of the uncompressed attribute to be stored compressed. Like > default_toast_compression, this could also be overridden at the column > level with something like > > ALTER TABLE mytbl ALTER mycol SET COMPRESSION lz4 RATIO 0.9; > > If the current "basically free" patch is intended for v15, then maybe all > this extra configurability stuff could wait for a bit, especially if we can > demonstrate a decent read performance boost with a more conservative > setting. However, I don't see anything terribly complicated about the > proposed configurability changes (assuming my proposal makes some amount of > sense to you and others). We seem to have a shortage of "others" showing up with opinions on this topic, but I guess I'm not very confident about the general utility of such a setting. Just to be clear, I'm also not very confident about the usefulness of the existing settings for controlling TOAST. Why is it useful default behavior to try to get rows down to 2kB by default, rather than 1.8kB or 3kB? Even more, why don't we try to compress primarily based on the length of individual attributes and then compress further only if the resulting tuple doesn't fit into a page at all? There doesn't seem to be anything magic about fitting tuples into a quarter-page, yet the code acts as though that's the primary goal - and then, when that didn't turn out to work well in all cases, we added a user-settable parameter (toast_tuple_target) to let you say you really want tuples in table X to fit into a third of a page or a fifth of a page instead of a quarter. And there's some justification for that: a proposal to fundamentally change the algorithm would likely have gotten bogged down for years, and the parameter no doubt lets you solve some problems. Yet if the whole algorithm is wrong, and I think maybe it is, then being able to change the constants is not really getting us where we need to be. Then, too, I'm not very confident about the usefulness of EXTENDED, EXTERNAL, and MAIN. I think it's useful to be able to categorically disable compression (as EXTERNAL does), but you can't categorically disable out-of-line storage because the value can be bigger than the page, so MAIN vs. EXTENDED is just changing the threshold for the use of out-of-line storage. However, it does so in a way that IMHO is not particularly intuitive, which goes back to my earlier point about the algorithm seeming wacky, and it's in any case unclear why we should offer exactly two possible thresholds and not any of the intermediate values. I think I'm prepared to concede that the setting you are proposing here is easier to understand than either of those. Saying "store this value compressed only if you an thereby reduce the size by at least 10%" is an understandable directive which I think will make sense even to people who know very little about PostgreSQL or databases generally, as long as they have some vague idea how compression works. However, I guess I am not confident that the setting would get much use, or that it is the best way to measure what we care about. For instance, if my value is 19kB in length, saving 10% isn't very impactful, because it's still the same number of chunks, and the only benefit I gain is that the last chunk is smaller, which is probably not very impactful because I'm still going to have 8 full chunks. I will gain more if the value is bigger or smaller. Going up, if I increase the size to 19MB or 190MB, I'm really saving a very significant amount of cache space and, potentially, I/O. If I was happy with a 10% savings at 19kB, I will likely be happy with a considerably smaller economy here. Going down, if I decrease the size to 1.9kB, I'm not storing any full-sized chunks any more, so reducing the size of the one partial chunk that I'm storing seems like it could have a real impact, provided of course that other toasted values in the same table are similarly small. If the whole TOAST table is full of just partial chunks, then the smaller they are the more will fit into a page, all things being equal, and thus the smaller the TOAST table will be. Maybe the conclusion here is that more thought is needed before changing anything in this area.... -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, Mar 22, 2022 at 04:34:05PM -0400, Robert Haas wrote: > We seem to have a shortage of "others" showing up with opinions on > this topic, but I guess I'm not very confident about the general > utility of such a setting. Just to be clear, I'm also not very > confident about the usefulness of the existing settings for > controlling TOAST. Why is it useful default behavior to try to get > rows down to 2kB by default, rather than 1.8kB or 3kB? Even more, why > don't we try to compress primarily based on the length of individual > attributes and then compress further only if the resulting tuple > doesn't fit into a page at all? There doesn't seem to be anything > magic about fitting tuples into a quarter-page, yet the code acts as > though that's the primary goal - and then, when that didn't turn out > to work well in all cases, we added a user-settable parameter > (toast_tuple_target) to let you say you really want tuples in table X > to fit into a third of a page or a fifth of a page instead of a > quarter. And there's some justification for that: a proposal to > fundamentally change the algorithm would likely have gotten bogged > down for years, and the parameter no doubt lets you solve some > problems. Yet if the whole algorithm is wrong, and I think maybe it > is, then being able to change the constants is not really getting us > where we need to be. > > Then, too, I'm not very confident about the usefulness of EXTENDED, > EXTERNAL, and MAIN. I think it's useful to be able to categorically > disable compression (as EXTERNAL does), but you can't categorically > disable out-of-line storage because the value can be bigger than the > page, so MAIN vs. EXTENDED is just changing the threshold for the use > of out-of-line storage. However, it does so in a way that IMHO is not > particularly intuitive, which goes back to my earlier point about the > algorithm seeming wacky, and it's in any case unclear why we should > offer exactly two possible thresholds and not any of the intermediate > values. I agree with all of this. Adding configurability for each constant might help folks in the short term, but using these knobs probably requires quite a bit of expertise in Postgres internals as well as a good understanding of your data. I think we ought to revist TOAST configurability from a user perspective. IOW what can be chosen automatically, and how do we enable users to effectively configure the settings that cannot be chosen automatically? IMO this is a worthwhile conversation to have as long as it doesn't stray too far into the "let's rewrite TOAST" realm. I think there is always going to be some level of complexity with stuff like TOAST, but there are probably all sorts of ways to simplify/improve it also. > Maybe the conclusion here is that more thought is needed before > changing anything in this area.... You've certainly got me thinking more about this. If the scope of this work is going to expand, a few months before the first v16 commitfest is probably the right time for it. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Tue, Mar 22, 2022 at 02:42:53PM -0700, Nathan Bossart wrote: > On Tue, Mar 22, 2022 at 04:34:05PM -0400, Robert Haas wrote: >> Then, too, I'm not very confident about the usefulness of EXTENDED, >> EXTERNAL, and MAIN. I think it's useful to be able to categorically >> disable compression (as EXTERNAL does), but you can't categorically >> disable out-of-line storage because the value can be bigger than the >> page, so MAIN vs. EXTENDED is just changing the threshold for the use >> of out-of-line storage. However, it does so in a way that IMHO is not >> particularly intuitive, which goes back to my earlier point about the >> algorithm seeming wacky, and it's in any case unclear why we should >> offer exactly two possible thresholds and not any of the intermediate >> values. > > I agree with all of this. Adding configurability for each constant might > help folks in the short term, but using these knobs probably requires quite > a bit of expertise in Postgres internals as well as a good understanding of > your data. I think we ought to revist TOAST configurability from a user > perspective. IOW what can be chosen automatically, and how do we enable > users to effectively configure the settings that cannot be chosen > automatically? IMO this is a worthwhile conversation to have as long as it > doesn't stray too far into the "let's rewrite TOAST" realm. I think there > is always going to be some level of complexity with stuff like TOAST, but > there are probably all sorts of ways to simplify/improve it also. I agree with this feeling. TOAST has already too many configuration parameters that have their own way of behaving slightling differently. If we could reduce this number rather than increasing it, the better. I would be also curious to see how much those parameters become relevant with more compression options possible with toast values. -- Michael
Attachment
On Wed, Mar 23, 2022 at 9:58 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Mar 22, 2022 at 02:42:53PM -0700, Nathan Bossart wrote: > > On Tue, Mar 22, 2022 at 04:34:05PM -0400, Robert Haas wrote: > >> Then, too, I'm not very confident about the usefulness of EXTENDED, > >> EXTERNAL, and MAIN. I think it's useful to be able to categorically > >> disable compression (as EXTERNAL does), but you can't categorically > >> disable out-of-line storage because the value can be bigger than the > >> page, so MAIN vs. EXTENDED is just changing the threshold for the use > >> of out-of-line storage. However, it does so in a way that IMHO is not > >> particularly intuitive, which goes back to my earlier point about the > >> algorithm seeming wacky, and it's in any case unclear why we should > >> offer exactly two possible thresholds and not any of the intermediate > >> values. > > > > I agree with all of this. Adding configurability for each constant might > > help folks in the short term, but using these knobs probably requires quite > > a bit of expertise in Postgres internals as well as a good understanding of > > your data. I think we ought to revist TOAST configurability from a user > > perspective. IOW what can be chosen automatically, and how do we enable > > users to effectively configure the settings that cannot be chosen > > automatically? IMO this is a worthwhile conversation to have as long as it > > doesn't stray too far into the "let's rewrite TOAST" realm. I think there > > is always going to be some level of complexity with stuff like TOAST, but > > there are probably all sorts of ways to simplify/improve it also. > > I agree with this feeling. TOAST has already too many configuration > parameters that have their own way of behaving slightling differently. > If we could reduce this number rather than increasing it, the better. > I would be also curious to see how much those parameters become > relevant with more compression options possible with toast values. Agree with you, it doesn't make much sense here to add an extra user visible configuration parameter, where it is not really clear for users how to really configure it to get the best out of it. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Fri, Mar 18, 2022 at 1:35 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > > I guess I think we should be slightly more ambitious. One idea could be to > create a default_toast_compression_ratio GUC with a default of 0.95. This > means that, by default, a compressed attribute must be 0.95x or less of the > size of the uncompressed attribute to be stored compressed. Like > default_toast_compression, this could also be overridden at the column > level with something like I am not sure that we want a GUC to control that but we can certainly be more ambitious. Basically, in the current patch if data is slightly large then we would always prefer to store the compressed data, e.g. if the data size is 200kB then even if the compression ratio is as low as 1% then we would choose to store then compressed data. I think we can make it based on the compression ratio and then upper bound it with the number of chunk differences. For example if the compression ratio < 10% then stored it uncompressed iff the chunk difference < threshold. But with that we might see performance impact on the smaller data which has a compressed ratio < 10% because their chunk difference will always be under the threshold. So maybe the chunk difference threshold can be a function based on the total numbers of chunks required for the data, maybe a logarithmic function so that the threshold grows slowly along with the base data size. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com