Thread: Optimize external TOAST storage

Optimize external TOAST storage

From
davinder singh
Date:
Hi,

For Toast storage [1] in PostgreSQL, first, the attribute value is compressed
and then divided into chunks. The problem with storing compressed value is, if we
are not saving enough space such that it reduces the #chunks then we end up
adding extra decompression cost on every read.
Based on the discussion with Robert and Dilip, we tried to optimize this
process. The idea is to TOAST the compressed value only if we are saving at least
1 chunk(2KB default) of disk storage else use the uncompressed one. In this way,
we will save decompression costs without losing much on storage.

In our tests, we have observed improvement in the read performance up to 28% by
giving up at most TOAST_MAX_CHUNK_SIZE (2KB) bytes for storage on disk. The
gain is more on large attributes (size > 4KB) because compression/decompression
cost increases with size.
However, We have found, this assumption is not true when the data compression
ratio is more and the size is barely big enough to cause TOASTing. For example,
in the following test 4. b, we didn't get any performance advantage but the table
size grew by 42% by storing uncompressed values.
Test Setup.

Create table t1_lz4 ( a text compression lz4, b text compression lz4);
-- Generate random data
create or replace function generate_att_data(len_info int)
returns text
language plpgsql
as
$$
declare
   value text;
begin
   select array_agg(md5(g::text))
   into value
   from generate_series(1, round(len_info/33)::int) g;
   return value;
end;
$$;

--Test
Select b from t1_lz4;

Test 1:
Data: rows 200000
insert into t1_lz4(a, b) select generate_att_data(364), repeat (generate_att_data(1980), 2);
Summary:
Attribute size: original: 7925 bytes, after compression: 7845 bytes
Time for select: head: 42  sec, patch: 37 sec, Performance Gain: 11%
table size: Head 1662 MB, Patch: 1662 MB

Test 2:
Data: rows 100000
insert into t1_lz4(a, b) select generate_att_data(364), generate_att_data(16505);
Summary:
Attribute size: original: 16505 bytes, after compression: 16050 bytes
Time for select: head: 35.6  sec, patch: 30 sec, Performance Gain: 14%
table size: Head 1636 MB, Patch: 1688 MB

Test 3:
Data: rows 50000
insert into t1_lz4(a, b) select generate_att_data(364), generate_att_data(31685);
Summary:
Attribute size: original: 31685 bytes, after compression: 30263 bytes
Time for select: head: 35.4  sec, patch: 25.5 sec, Performance Gain: 28%
table size: Head 1601 MB, Patch: 1601 MB

Test 4.a:
Data: rows 200000
insert into t1_lz4(a, b) select generate_att_data(11), repeat ('b', 250) || generate_att_data(3885);
Summary:
Attribute size: original: 3885 bytes, after compression: 3645 bytes
Time for select: head: 28 sec, patch: 26 sec, Performance Gain: 7%
table size: Head 872 MB, Patch: 872 MB

Test 4.b (High compression):
Data: rows 200000
insert into t1_lz4(a, b) select generate_att_data(364), repeat (generate_att_data(1980), 2);
Summary:
Attribute size: original: 3966 bytes, after compression: 2012 bytes
Time for select: head: 27  sec, patch: 26 sec, Performance Gain: 0%
table size: Head 612 MB, Patch: 872 MB

This is the worst case for this optimization because of the following 2 reasons.
First, the table size would grow by 50% when compressed size is half of the original size and
yet barely large enough for TOASTing. Table size can't grow more than that because If
compression reduces the size even more then it will reduce the #chunks as well and it stores
the compressed value in the table.

Second, not much gain in performance because of the small attribute size, more attributes fit

in page (almost twice), on each page access it can access twice the number of rows. And

also small value means low compression/decompression costs.


We have avoided such cases by applying the optimization when attribute size > 4KB.


Regards,
Davinder
Attachment

Re: Optimize external TOAST storage

From
Dilip Kumar
Date:
On Mon, Feb 28, 2022 at 3:22 PM davinder singh
<davindersingh2692@gmail.com> wrote:
>
> Hi,
>
> For Toast storage [1] in PostgreSQL, first, the attribute value is compressed
> and then divided into chunks. The problem with storing compressed value is, if we
> are not saving enough space such that it reduces the #chunks then we end up
> adding extra decompression cost on every read.
> Based on the discussion with Robert and Dilip, we tried to optimize this
> process. The idea is to TOAST the compressed value only if we are saving at least
> 1 chunk(2KB default) of disk storage else use the uncompressed one. In this way,
> we will save decompression costs without losing much on storage.
>
> In our tests, we have observed improvement in the read performance up to 28% by
> giving up at most TOAST_MAX_CHUNK_SIZE (2KB) bytes for storage on disk. The
> gain is more on large attributes (size > 4KB) because compression/decompression
> cost increases with size.
> However, We have found, this assumption is not true when the data compression
> ratio is more and the size is barely big enough to cause TOASTing. For example,
> in the following test 4. b, we didn't get any performance advantage but the table
> size grew by 42% by storing uncompressed values.
> Test Setup.
>
> Create table t1_lz4 ( a text compression lz4, b text compression lz4);
> -- Generate random data
> create or replace function generate_att_data(len_info int)
> returns text
> language plpgsql
> as
> $$
> declare
>    value text;
> begin
>    select array_agg(md5(g::text))
>    into value
>    from generate_series(1, round(len_info/33)::int) g;
>    return value;
> end;
> $$;
>
> --Test
> Select b from t1_lz4;
>
> Test 1:
> Data: rows 200000
> insert into t1_lz4(a, b) select generate_att_data(364), repeat (generate_att_data(1980), 2);
> Summary:
> Attribute size: original: 7925 bytes, after compression: 7845 bytes
> Time for select: head: 42  sec, patch: 37 sec, Performance Gain: 11%
> table size: Head 1662 MB, Patch: 1662 MB
>
> Test 2:
> Data: rows 100000
> insert into t1_lz4(a, b) select generate_att_data(364), generate_att_data(16505);
> Summary:
> Attribute size: original: 16505 bytes, after compression: 16050 bytes
> Time for select: head: 35.6  sec, patch: 30 sec, Performance Gain: 14%
> table size: Head 1636 MB, Patch: 1688 MB
>
> Test 3:
> Data: rows 50000
> insert into t1_lz4(a, b) select generate_att_data(364), generate_att_data(31685);
> Summary:
> Attribute size: original: 31685 bytes, after compression: 30263 bytes
> Time for select: head: 35.4  sec, patch: 25.5 sec, Performance Gain: 28%
> table size: Head 1601 MB, Patch: 1601 MB
>
> Test 4.a:
> Data: rows 200000
> insert into t1_lz4(a, b) select generate_att_data(11), repeat ('b', 250) || generate_att_data(3885);
> Summary:
> Attribute size: original: 3885 bytes, after compression: 3645 bytes
> Time for select: head: 28 sec, patch: 26 sec, Performance Gain: 7%
> table size: Head 872 MB, Patch: 872 MB
>
> Test 4.b (High compression):
> Data: rows 200000
> insert into t1_lz4(a, b) select generate_att_data(364), repeat (generate_att_data(1980), 2);
> Summary:
> Attribute size: original: 3966 bytes, after compression: 2012 bytes
> Time for select: head: 27  sec, patch: 26 sec, Performance Gain: 0%
> table size: Head 612 MB, Patch: 872 MB

I think the idea looks interesting and the results are also promising.
I have a few initial comments, later I will do more detailed review.

1.
+    if (*value == orig_toast_value)
+        toast_tuple_externalize(ttc, attribute, options);

Isn't it looks cleaner to check whther the attribute is compressed or
not like this
VARATT_IS_COMPRESSED(DatumGetPointer(*value)) ?

2.
+            /* Using the uncompressed data instead, deleting
compressed data. */
+            pfree(DatumGetPointer(*value));

change as below
/deleting compressed data/release memory for the compressed data

3.
+    memcpy(toast_attr_copy, toast_attr, sizeof(toast_attr));
+    memcpy(toast_values_copy, toast_values, sizeof(toast_values));
+

Add some comment here, what we are trying to copy and why?

4.
+            /* incompressible, ignore on subsequent compression passes. */
+            orig_attr->tai_colflags |= TOASTCOL_INCOMPRESSIBLE;

Do we need to set TOASTCOL_INCOMPRESSIBLE while trying to externalize
it, the comment say "ignore on subsequent compression passes"
but after this will there be more compression passes?  If we need to
set this TOASTCOL_INCOMPRESSIBLE then comment should explain this.




-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Optimize external TOAST storage

From
davinder singh
Date:
Thanks Dilip, I have fixed your comments, please find the updated patch.

On Tue, Mar 8, 2022 at 9:44 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:.
+            /* incompressible, ignore on subsequent compression passes. */
+            orig_attr->tai_colflags |= TOASTCOL_INCOMPRESSIBLE;

Do we need to set TOASTCOL_INCOMPRESSIBLE while trying to externalize
it, the comment say "ignore on subsequent compression passes"
but after this will there be more compression passes?  If we need to
set this TOASTCOL_INCOMPRESSIBLE then comment should explain this.
That was a mistake, this flag is not required at this point, as the attribute is externalized it will be marked as TOASTCOL_IGNORE, and such columns are not considered for compression, I removed it. Thanks for pointing it out.

--
Regards,
Davinder
Attachment

Re: Optimize external TOAST storage

From
Dilip Kumar
Date:
On Thu, Mar 10, 2022 at 2:04 PM davinder singh
<davindersingh2692@gmail.com> wrote:
>
> Thanks Dilip, I have fixed your comments, please find the updated patch.
>

Some more comments

+    /*
+     * For those cases where storing compressed data is not optimal,
We will use
+     * this pointer copy for referring uncompressed data.
+     */
+    memcpy(toast_attr_copy, toast_attr, sizeof(toast_attr));
+    memcpy(toast_values_copy, toast_values, sizeof(toast_values));

I think we can change the comments like below

"Preserve references to the original uncompressed data before
attempting the compression.  So that during externalize if we decide
not to store the compressed data then we have the original data with
us.  For more details refer to comments atop
toast_tuple_opt_externalize".

+/*
+ * Don't save compressed data to external storage unless it saves I/O while
+ * accessing the same data by reducing the number of chunks.
+ */
+void
+toast_tuple_opt_externalize(ToastTupleContext *ttc, int attribute, int options,
+                            Datum orig_toast_value, ToastAttrInfo *orig_attr)

I think these comments are not explaining what is the problem with
storing the compressed data.  So you need to expand this comment
saying if it is not reducing the number or chunks then there is no
point in storing the compressed data because then we will have
additional decompression cost whenever we are fetching that data.


+
+    /* Sanity check: if data is not compressed then we can proceed as usual. */
+    if (!VARATT_IS_COMPRESSED(DatumGetPointer(*value)))
+        toast_tuple_externalize(ttc, attribute, options);

I think we don't need "Sanity check:" here, the remaining part of the
comment is fine.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Optimize external TOAST storage

From
Nathan Bossart
Date:
I think this is an interesting idea.  My first thought is that it would be
nice if we didn't have to first compress the data to see whether we wanted
to store it compressed or not, but I don't think there is much we can do
about that.  In any case, we aren't adding much work in the write path
compared to the potential savings in the read path, so that is probably
okay.

Do you think it is worth making this configurable?  I don't think it is
outside the realm of possibility for some users to care more about disk
space than read performance.  And maybe the threshold for this optimization
could differ based on the workload.

 extern void toast_tuple_externalize(ToastTupleContext *ttc, int attribute,
                                     int options);
+extern void toast_tuple_opt_externalize(ToastTupleContext *ttc, int attribute,
+                                        int options, Datum old_toast_value,
+                                        ToastAttrInfo *old_toast_attr);

Could we bake this into toast_tuple_externalize() so that all existing
callers benefit from this optimization?  Is there a reason to export both
functions?  Perhaps toast_tuple_externalize() should be renamed and made
static, and then this new function could be called
toast_tuple_externalize() (which would be a wrapper around the internal
function).

+    /* Sanity check: if data is not compressed then we can proceed as usual. */
+    if (!VARATT_IS_COMPRESSED(DatumGetPointer(*value)))
+        toast_tuple_externalize(ttc, attribute, options);

With a --enable-cassert build, this line causes assertion failures in the
call to GetMemoryChunkContext() from pfree().  'make check' is enough to
reproduce it.  Specifically, it fails the following assertion:

    AssertArg(MemoryContextIsValid(context));

I didn't see an existing commitfest entry for this patch.  I'd encourage
you to add one in the July commitfest:

    https://commitfest.postgresql.org/38/

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Optimize external TOAST storage

From
davinder singh
Date:

Thanks, Nathan, for the review comments. Please find the updated patch.
On Sun, Mar 13, 2022 at 3:43 AM Nathan Bossart <nathandbossart@gmail.com> wrote: 
Do you think it is worth making this configurable?  I don't think it is
outside the realm of possibility for some users to care more about disk
space than read performance.  And maybe the threshold for this optimization
could differ based on the workload.

I think here we can break it into two parts.
The first part if the user cares about the disk more than reading performance, disable this?
That is a good idea, we can try this, lets see what others say.

Regarding the 2nd part of configuring the threshold, Based on our
experiments, we have fixed it for the attributes with size > 2 * chunk_size.
The default chunk_size is 2KB and the page size is 8KB. While toasting each
attribute is divided into chunks, and each page can hold a max of 4 such chunks.
We only need to think about the space used by the last chunk of the attribute.
This means with each value optimization, it might use extra space in the range
(0B,2KB]. I think this extra space is independent of attribute size. So we don't
need to worry about configuring this threshold. Let me know if I missed something
here.


 extern void toast_tuple_externalize(ToastTupleContext *ttc, int attribute,
                                     int options);
+extern void toast_tuple_opt_externalize(ToastTupleContext *ttc, int attribute,
+                                        int options, Datum old_toast_value,
+                                        ToastAttrInfo *old_toast_attr);

Could we bake this into toast_tuple_externalize() so that all existing
callers benefit from this optimization?  Is there a reason to export both
functions?  Perhaps toast_tuple_externalize() should be renamed and made
static, and then this new function could be called
toast_tuple_externalize() (which would be a wrapper around the internal
function).
This function is used only in heap_toast_insert_or_update(), all existing callers are
using new function only. As you suggested, I have renamed the new function as
wrapper function and also only exporting the new function.
 

+    /* Sanity check: if data is not compressed then we can proceed as usual. */
+    if (!VARATT_IS_COMPRESSED(DatumGetPointer(*value)))
+        toast_tuple_externalize(ttc, attribute, options);

With a --enable-cassert build, this line causes assertion failures in the
call to GetMemoryChunkContext() from pfree().  'make check' is enough to
reproduce it.  Specifically, it fails the following assertion:

        AssertArg(MemoryContextIsValid(context));

Thanks for pointing it out, this failure started because I was not handling the
case when the data is already compressed even before toasting. Following
check verifies if the data is compressed or not but that is not enough because
we can't optimize the toasting if we didn't get the data in the
uncompressed form in the first place from the source.
+    /* If data is not compressed then we can proceed as usual. */
+    if (!VARATT_IS_COMPRESSED(DatumGetPointer(*value)))

v1 patch didn't have this problem because it was verifying whether we have
compressed data in this toasting round or not. I have changed back to the
earlier version.
+    /* If data is not compressed then we can proceed as usual. */
+    if (*value == orig_toast_value)



--
Regards,
Davinder
Attachment

Re: Optimize external TOAST storage

From
Nathan Bossart
Date:
On Wed, Mar 16, 2022 at 01:03:38AM +0530, davinder singh wrote:
> Regarding the 2nd part of configuring the threshold, Based on our
> experiments, we have fixed it for the attributes with size > 2 * chunk_size.
> The default chunk_size is 2KB and the page size is 8KB. While toasting each
> attribute is divided into chunks, and each page can hold a max of 4 such
> chunks.
> We only need to think about the space used by the last chunk of the
> attribute.
> This means with each value optimization, it might use extra space in the
> range
> (0B,2KB]. I think this extra space is independent of attribute size. So we
> don't
> need to worry about configuring this threshold. Let me know if I missed
> something
> here.

Looking closer, ISTM there are two thresholds here.  The first threshold is
the minimum size of the uncompressed data required to trigger this
optimization.  The purpose behind this threshold is to avoid increasing
disk space usage in return for very little gain in the read path (i.e.,
skipping decompression).  Specifically, this threshold helps avoid
increasing disk space usage when there are many small TOAST attributes.
You've chosen 2x the TOAST chunk size as the threshold.

The second threshold is the compression ratio required to trigger this
optimization.  The purpose behind this threshold is to avoid storing poorly
compressed data so that we can improve performance in the read path (i.e.,
skipping decompression).  You've chosen >=1 TOAST chunk as the threshold.
This means that when I have an attribute with size 3x a TOAST chunk, I'll
only store it compressed when the compressed attribute is 2x a TOAST chunk
or smaller (a 33% improvement).  If I have an attribute with size 100x a
TOAST chunk, I'll store it compressed when the compressed attribute is 99x
a TOAST chunk or smaller (a 1% improvement).

IIUC this means that it is much more likely that you will store a
compressed attribute when the uncompressed data is larger.  The benefit of
this approach is that you know you'll never give up more than one TOAST
chunk of disk space in the name of faster reads.  The downside is that a
huge number of smaller attributes can require much more disk space than if
they were universally compressed.

Perhaps this is missing one additional threshold.  Perhaps there should be
a compression ratio for which the optimization is ignored.  If my
uncompressed data is 3x a TOAST chunk and compressing it would save half of
a TOAST chunk of disk space (a ~17% improvement), maybe we should still
store it compressed.  I think this threshold would be most important for
servers with many small TOAST attributes.  I don't know whether it is
necessary to make all of this configurable or whether we should just use
some heuristics to find a good balance.  In any case, I imagine the
settings should look different for the different TOAST compression methods.

Or maybe we should just make the first threshold the configuration option
for this feature.  If you set it to -1, the optimization is never used.  If
you set it to 0, you always try to save on read time at the expense of disk
space.  If you set it to 2x the TOAST chunk size, you might use up to 50%
more disk space for read path optimizations.  If you set it to 100x the
TOAST chunk size, you might use up to 1% more disk space for read
optimizations.  By default, this configuration option could be set
conservatively (e.g., 10x a TOAST chunk, which could lead to a maximum of
10% more disk space).  I suspect the increase in disk space would be much
less than these numbers in most cases, but it might be helpful to think of
these things in terms of the worst case scenario.

I apologize for thinking out loud a bit here, but I hope this gives you
some insight into my perspective on this.  In general, I am skeptical that
we can choose one threshold that will work for all PostgreSQL installations
in the known universe.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Optimize external TOAST storage

From
Robert Haas
Date:
On Tue, Mar 15, 2022 at 5:48 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> I apologize for thinking out loud a bit here, but I hope this gives you
> some insight into my perspective on this.  In general, I am skeptical that
> we can choose one threshold that will work for all PostgreSQL installations
> in the known universe.

I would like to take a slightly contrary position. I think that a
system here that involves multiple knobs is going to be too
complicated to be of any real-world use, because almost nobody will
understand it or tune it properly for their installation. And who is
to say that a single setting would even be appropriate for a whole
installation, as opposed to something that needs to be tuned for each
individual table? A single tunable might be OK, but what I would
really like here is a heuristic that, while perhaps not optimal in
every environment, is good enough that a very high percentage of users
will never need to worry about it. In a case like this, a small gain
that happens automatically feels better than a large gain that
requires manual tweaking in every install.

Now that doesn't answer the question of what that heuristic should be.
I initially thought that if compression didn't end up reducing the
number of TOAST chunks then there was no point, but that's not true,
because having the last chunk be smaller than the rest can allow us to
fit more than 4 into a page rather than exactly 4. However, this
effect isn't likely to be important if most chunks are full-size
chunks. If we insert 100 full-size chunks and 1 partial chunk at the
end, we can't save much even if that chunk ends up being 1 byte
instead of a full-size chunk. 25 TOAST table pages out of 26 are going
to be full of full-size chunks either way, so we can't save more than
~4% and we might easily save nothing at all. As you say, the potential
savings are larger as the values get smaller, because a higher
proportion of the TOAST table pages are not quite full. So I think the
only cases where it's even interesting to think about suppressing this
optimization are those where the value is quite small -- and maybe the
thing to do is just pick a conservatively large threshold and call it
good.

For example, say that instead of applying this technique when there
are at least 2 TOAST chunks, we apply it when there are at least 500
(i.e. normally 1MB). It's hard for me to imagine that we can ever
lose, and in fact I think we could come down quite a bit from there
and still end up winning pretty much all the time. Setting the
threshold to, say, 50 or 100 or 150 TOAST chunks instead of 2 may
leave some money on the table, but if it means that we don't have
meaningful downsides and we don't have tunables for users to fiddle
with, it might be worth it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Optimize external TOAST storage

From
Nathan Bossart
Date:
On Wed, Mar 16, 2022 at 10:08:45AM -0400, Robert Haas wrote:
> I would like to take a slightly contrary position. I think that a
> system here that involves multiple knobs is going to be too
> complicated to be of any real-world use, because almost nobody will
> understand it or tune it properly for their installation. And who is
> to say that a single setting would even be appropriate for a whole
> installation, as opposed to something that needs to be tuned for each
> individual table? A single tunable might be OK, but what I would
> really like here is a heuristic that, while perhaps not optimal in
> every environment, is good enough that a very high percentage of users
> will never need to worry about it. In a case like this, a small gain
> that happens automatically feels better than a large gain that
> requires manual tweaking in every install.

Agreed.  If there is a tunable, it needs to be as simple as possible.  And
ideally most users would never need to use it because the default would be
"good enough."

> Now that doesn't answer the question of what that heuristic should be.
> I initially thought that if compression didn't end up reducing the
> number of TOAST chunks then there was no point, but that's not true,
> because having the last chunk be smaller than the rest can allow us to
> fit more than 4 into a page rather than exactly 4. However, this
> effect isn't likely to be important if most chunks are full-size
> chunks. If we insert 100 full-size chunks and 1 partial chunk at the
> end, we can't save much even if that chunk ends up being 1 byte
> instead of a full-size chunk. 25 TOAST table pages out of 26 are going
> to be full of full-size chunks either way, so we can't save more than
> ~4% and we might easily save nothing at all. As you say, the potential
> savings are larger as the values get smaller, because a higher
> proportion of the TOAST table pages are not quite full. So I think the
> only cases where it's even interesting to think about suppressing this
> optimization are those where the value is quite small -- and maybe the
> thing to do is just pick a conservatively large threshold and call it
> good.
> 
> For example, say that instead of applying this technique when there
> are at least 2 TOAST chunks, we apply it when there are at least 500
> (i.e. normally 1MB). It's hard for me to imagine that we can ever
> lose, and in fact I think we could come down quite a bit from there
> and still end up winning pretty much all the time. Setting the
> threshold to, say, 50 or 100 or 150 TOAST chunks instead of 2 may
> leave some money on the table, but if it means that we don't have
> meaningful downsides and we don't have tunables for users to fiddle
> with, it might be worth it.

As long as we can demonstrate some decent improvements, I think using a
conservative threshold is a good idea.  I do wonder whether thiѕ could
weaken the read performance gains quite a bit, though.

Thinking further, is simply reducing the number of TOAST chunks the right
thing to look at?  If I want to add a TOAST attribute that requires 100,000
chunks, and you told me that I could save 10% in the read path for an extra
250 chunks of disk space, I would probably choose read performance.  If I
wanted to add 100,000 attributes that were each 3 chunks, and you told me
that I could save 10% in the read path for an extra 75,000 chunks of disk
space, I might choose the extra disk space.  These are admittedly extreme
(and maybe even impossible) examples, but my point is that the amount of
disk space you are willing to give up may be related to the size of the
attribute.  And maybe one way to extract additional read performance with
this optimization is to use a variable threshold so that we are more likely
to use it for large attributes.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Optimize external TOAST storage

From
Nathan Bossart
Date:
On Wed, Mar 16, 2022 at 11:36:56AM -0700, Nathan Bossart wrote:
> Thinking further, is simply reducing the number of TOAST chunks the right
> thing to look at?  If I want to add a TOAST attribute that requires 100,000
> chunks, and you told me that I could save 10% in the read path for an extra
> 250 chunks of disk space, I would probably choose read performance.  If I
> wanted to add 100,000 attributes that were each 3 chunks, and you told me
> that I could save 10% in the read path for an extra 75,000 chunks of disk
> space, I might choose the extra disk space.  These are admittedly extreme
> (and maybe even impossible) examples, but my point is that the amount of
> disk space you are willing to give up may be related to the size of the
> attribute.  And maybe one way to extract additional read performance with
> this optimization is to use a variable threshold so that we are more likely
> to use it for large attributes.

I might be overthinking this.  Maybe it is enough to skip compressing the
attribute whenever compression saves no more than some percentage of the
uncompressed attribute size.  A conservative default setting might be
something like 5% or 10%.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Optimize external TOAST storage

From
Robert Haas
Date:
On Wed, Mar 16, 2022 at 2:36 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> Thinking further, is simply reducing the number of TOAST chunks the right
> thing to look at?  If I want to add a TOAST attribute that requires 100,000
> chunks, and you told me that I could save 10% in the read path for an extra
> 250 chunks of disk space, I would probably choose read performance.  If I
> wanted to add 100,000 attributes that were each 3 chunks, and you told me
> that I could save 10% in the read path for an extra 75,000 chunks of disk
> space, I might choose the extra disk space.  These are admittedly extreme
> (and maybe even impossible) examples, but my point is that the amount of
> disk space you are willing to give up may be related to the size of the
> attribute.  And maybe one way to extract additional read performance with
> this optimization is to use a variable threshold so that we are more likely
> to use it for large attributes.

Right, so perhaps the ultimate thing here would be a more fine-grained
knob than SET STORAGE EXTERNAL -- something that allows you to specify
that you want to compress only when it really helps. While some people
might find that useful, I think the current patch is less ambitious,
and I think that's OK. It just wants to save something in the cases
where it's basically free. Unfortunately we've learned that it's never
*entirely* free because making the last TOAST chunk longer can always
cost you something, even if it gets longer by only 1 byte. But for
larger values it's hard for that to be significant.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Optimize external TOAST storage

From
Nathan Bossart
Date:
On Thu, Mar 17, 2022 at 01:04:17PM -0400, Robert Haas wrote:
> Right, so perhaps the ultimate thing here would be a more fine-grained
> knob than SET STORAGE EXTERNAL -- something that allows you to specify
> that you want to compress only when it really helps. While some people
> might find that useful, I think the current patch is less ambitious,
> and I think that's OK. It just wants to save something in the cases
> where it's basically free. Unfortunately we've learned that it's never
> *entirely* free because making the last TOAST chunk longer can always
> cost you something, even if it gets longer by only 1 byte. But for
> larger values it's hard for that to be significant.

I guess I think we should be slightly more ambitious.  One idea could be to
create a default_toast_compression_ratio GUC with a default of 0.95.  This
means that, by default, a compressed attribute must be 0.95x or less of the
size of the uncompressed attribute to be stored compressed.  Like
default_toast_compression, this could also be overridden at the column
level with something like

    ALTER TABLE mytbl ALTER mycol SET COMPRESSION lz4 RATIO 0.9;

If the current "basically free" patch is intended for v15, then maybe all
this extra configurability stuff could wait for a bit, especially if we can
demonstrate a decent read performance boost with a more conservative
setting.  However, I don't see anything terribly complicated about the
proposed configurability changes (assuming my proposal makes some amount of
sense to you and others).

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Optimize external TOAST storage

From
Robert Haas
Date:
On Thu, Mar 17, 2022 at 4:05 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> On Thu, Mar 17, 2022 at 01:04:17PM -0400, Robert Haas wrote:
> > Right, so perhaps the ultimate thing here would be a more fine-grained
> > knob than SET STORAGE EXTERNAL -- something that allows you to specify
> > that you want to compress only when it really helps. While some people
> > might find that useful, I think the current patch is less ambitious,
> > and I think that's OK. It just wants to save something in the cases
> > where it's basically free. Unfortunately we've learned that it's never
> > *entirely* free because making the last TOAST chunk longer can always
> > cost you something, even if it gets longer by only 1 byte. But for
> > larger values it's hard for that to be significant.
>
> I guess I think we should be slightly more ambitious.  One idea could be to
> create a default_toast_compression_ratio GUC with a default of 0.95.  This
> means that, by default, a compressed attribute must be 0.95x or less of the
> size of the uncompressed attribute to be stored compressed.  Like
> default_toast_compression, this could also be overridden at the column
> level with something like
>
>         ALTER TABLE mytbl ALTER mycol SET COMPRESSION lz4 RATIO 0.9;
>
> If the current "basically free" patch is intended for v15, then maybe all
> this extra configurability stuff could wait for a bit, especially if we can
> demonstrate a decent read performance boost with a more conservative
> setting.  However, I don't see anything terribly complicated about the
> proposed configurability changes (assuming my proposal makes some amount of
> sense to you and others).

We seem to have a shortage of "others" showing up with opinions on
this topic, but I guess I'm not very confident about the general
utility of such a setting. Just to be clear, I'm also not very
confident about the usefulness of the existing settings for
controlling TOAST. Why is it useful default behavior to try to get
rows down to 2kB by default, rather than 1.8kB or 3kB? Even more, why
don't we try to compress primarily based on the length of individual
attributes and then compress further only if the resulting tuple
doesn't fit into a page at all? There doesn't seem to be anything
magic about fitting tuples into a quarter-page, yet the code acts as
though that's the primary goal - and then, when that didn't turn out
to work well in all cases, we added a user-settable parameter
(toast_tuple_target) to let you say you really want tuples in table X
to fit into a third of a page or a fifth of a page instead of a
quarter. And there's some justification for that: a proposal to
fundamentally change the algorithm would likely have gotten bogged
down for years, and the parameter no doubt lets you solve some
problems. Yet if the whole algorithm is wrong, and I think maybe it
is, then being able to change the constants is not really getting us
where we need to be.

Then, too, I'm not very confident about the usefulness of EXTENDED,
EXTERNAL, and MAIN. I think it's useful to be able to categorically
disable compression (as EXTERNAL does), but you can't categorically
disable out-of-line storage because the value can be bigger than the
page, so MAIN vs. EXTENDED is just changing the threshold for the use
of out-of-line storage. However, it does so in a way that IMHO is not
particularly intuitive, which goes back to my earlier point about the
algorithm seeming wacky, and it's in any case unclear why we should
offer exactly two possible thresholds and not any of the intermediate
values.

I think I'm prepared to concede that the setting you are proposing
here is easier to understand than either of those. Saying "store this
value compressed only if you an thereby reduce the size by at least
10%" is an understandable directive which I think will make sense even
to people who know very little about PostgreSQL or databases
generally, as long as they have some vague idea how compression works.
However, I guess I am not confident that the setting would get much
use, or that it is the best way to measure what we care about. For
instance, if my value is 19kB in length, saving 10% isn't very
impactful, because it's still the same number of chunks, and the only
benefit I gain is that the last chunk is smaller, which is probably
not very impactful because I'm still going to have 8 full chunks. I
will gain more if the value is bigger or smaller. Going up, if I
increase the size to 19MB or 190MB, I'm really saving a very
significant amount of cache space and, potentially, I/O. If I was
happy with a 10% savings at 19kB, I will likely be happy with a
considerably smaller economy here. Going down, if I decrease the size
to 1.9kB, I'm not storing any full-sized chunks any more, so reducing
the size of the one partial chunk that I'm storing seems like it could
have a real impact, provided of course that other toasted values in
the same table are similarly small. If the whole TOAST table is full
of just partial chunks, then the smaller they are the more will fit
into a page, all things being equal, and thus the smaller the TOAST
table will be.

Maybe the conclusion here is that more thought is needed before
changing anything in this area....

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Optimize external TOAST storage

From
Nathan Bossart
Date:
On Tue, Mar 22, 2022 at 04:34:05PM -0400, Robert Haas wrote:
> We seem to have a shortage of "others" showing up with opinions on
> this topic, but I guess I'm not very confident about the general
> utility of such a setting. Just to be clear, I'm also not very
> confident about the usefulness of the existing settings for
> controlling TOAST. Why is it useful default behavior to try to get
> rows down to 2kB by default, rather than 1.8kB or 3kB? Even more, why
> don't we try to compress primarily based on the length of individual
> attributes and then compress further only if the resulting tuple
> doesn't fit into a page at all? There doesn't seem to be anything
> magic about fitting tuples into a quarter-page, yet the code acts as
> though that's the primary goal - and then, when that didn't turn out
> to work well in all cases, we added a user-settable parameter
> (toast_tuple_target) to let you say you really want tuples in table X
> to fit into a third of a page or a fifth of a page instead of a
> quarter. And there's some justification for that: a proposal to
> fundamentally change the algorithm would likely have gotten bogged
> down for years, and the parameter no doubt lets you solve some
> problems. Yet if the whole algorithm is wrong, and I think maybe it
> is, then being able to change the constants is not really getting us
> where we need to be.
> 
> Then, too, I'm not very confident about the usefulness of EXTENDED,
> EXTERNAL, and MAIN. I think it's useful to be able to categorically
> disable compression (as EXTERNAL does), but you can't categorically
> disable out-of-line storage because the value can be bigger than the
> page, so MAIN vs. EXTENDED is just changing the threshold for the use
> of out-of-line storage. However, it does so in a way that IMHO is not
> particularly intuitive, which goes back to my earlier point about the
> algorithm seeming wacky, and it's in any case unclear why we should
> offer exactly two possible thresholds and not any of the intermediate
> values.

I agree with all of this.  Adding configurability for each constant might
help folks in the short term, but using these knobs probably requires quite
a bit of expertise in Postgres internals as well as a good understanding of
your data.  I think we ought to revist TOAST configurability from a user
perspective.  IOW what can be chosen automatically, and how do we enable
users to effectively configure the settings that cannot be chosen
automatically?  IMO this is a worthwhile conversation to have as long as it
doesn't stray too far into the "let's rewrite TOAST" realm.  I think there
is always going to be some level of complexity with stuff like TOAST, but
there are probably all sorts of ways to simplify/improve it also.

> Maybe the conclusion here is that more thought is needed before
> changing anything in this area....

You've certainly got me thinking more about this.  If the scope of this
work is going to expand, a few months before the first v16 commitfest is
probably the right time for it.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Optimize external TOAST storage

From
Michael Paquier
Date:
On Tue, Mar 22, 2022 at 02:42:53PM -0700, Nathan Bossart wrote:
> On Tue, Mar 22, 2022 at 04:34:05PM -0400, Robert Haas wrote:
>> Then, too, I'm not very confident about the usefulness of EXTENDED,
>> EXTERNAL, and MAIN. I think it's useful to be able to categorically
>> disable compression (as EXTERNAL does), but you can't categorically
>> disable out-of-line storage because the value can be bigger than the
>> page, so MAIN vs. EXTENDED is just changing the threshold for the use
>> of out-of-line storage. However, it does so in a way that IMHO is not
>> particularly intuitive, which goes back to my earlier point about the
>> algorithm seeming wacky, and it's in any case unclear why we should
>> offer exactly two possible thresholds and not any of the intermediate
>> values.
>
> I agree with all of this.  Adding configurability for each constant might
> help folks in the short term, but using these knobs probably requires quite
> a bit of expertise in Postgres internals as well as a good understanding of
> your data.  I think we ought to revist TOAST configurability from a user
> perspective.  IOW what can be chosen automatically, and how do we enable
> users to effectively configure the settings that cannot be chosen
> automatically?  IMO this is a worthwhile conversation to have as long as it
> doesn't stray too far into the "let's rewrite TOAST" realm.  I think there
> is always going to be some level of complexity with stuff like TOAST, but
> there are probably all sorts of ways to simplify/improve it also.

I agree with this feeling.  TOAST has already too many configuration
parameters that have their own way of behaving slightling differently.
If we could reduce this number rather than increasing it, the better.
I would be also curious to see how much those parameters become
relevant with more compression options possible with toast values.
--
Michael

Attachment

Re: Optimize external TOAST storage

From
Dilip Kumar
Date:
On Wed, Mar 23, 2022 at 9:58 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Mar 22, 2022 at 02:42:53PM -0700, Nathan Bossart wrote:
> > On Tue, Mar 22, 2022 at 04:34:05PM -0400, Robert Haas wrote:
> >> Then, too, I'm not very confident about the usefulness of EXTENDED,
> >> EXTERNAL, and MAIN. I think it's useful to be able to categorically
> >> disable compression (as EXTERNAL does), but you can't categorically
> >> disable out-of-line storage because the value can be bigger than the
> >> page, so MAIN vs. EXTENDED is just changing the threshold for the use
> >> of out-of-line storage. However, it does so in a way that IMHO is not
> >> particularly intuitive, which goes back to my earlier point about the
> >> algorithm seeming wacky, and it's in any case unclear why we should
> >> offer exactly two possible thresholds and not any of the intermediate
> >> values.
> >
> > I agree with all of this.  Adding configurability for each constant might
> > help folks in the short term, but using these knobs probably requires quite
> > a bit of expertise in Postgres internals as well as a good understanding of
> > your data.  I think we ought to revist TOAST configurability from a user
> > perspective.  IOW what can be chosen automatically, and how do we enable
> > users to effectively configure the settings that cannot be chosen
> > automatically?  IMO this is a worthwhile conversation to have as long as it
> > doesn't stray too far into the "let's rewrite TOAST" realm.  I think there
> > is always going to be some level of complexity with stuff like TOAST, but
> > there are probably all sorts of ways to simplify/improve it also.
>
> I agree with this feeling.  TOAST has already too many configuration
> parameters that have their own way of behaving slightling differently.
> If we could reduce this number rather than increasing it, the better.
> I would be also curious to see how much those parameters become
> relevant with more compression options possible with toast values.

Agree with you, it doesn't make much sense here to add an extra user
visible configuration parameter, where it is not really clear for
users how to really configure it to get the best out of it.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Optimize external TOAST storage

From
Dilip Kumar
Date:
On Fri, Mar 18, 2022 at 1:35 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
>
> I guess I think we should be slightly more ambitious.  One idea could be to
> create a default_toast_compression_ratio GUC with a default of 0.95.  This
> means that, by default, a compressed attribute must be 0.95x or less of the
> size of the uncompressed attribute to be stored compressed.  Like
> default_toast_compression, this could also be overridden at the column
> level with something like

I am not sure that we want a GUC to control that but we can certainly
be more ambitious.  Basically, in the current patch if data is
slightly large then we would always prefer to store the compressed
data, e.g. if the data size is 200kB then even if the compression
ratio is as low as 1% then we would choose to store then compressed
data.

I think we can make it based on the compression ratio and then upper
bound it with the number of chunk differences.  For example if the
compression ratio < 10% then stored it uncompressed iff the chunk
difference < threshold.  But with that we might see performance impact
on the smaller data which has a compressed ratio < 10% because their
chunk difference will always be under the threshold.  So maybe the
chunk difference threshold can be a function based on the total
numbers of chunks required for the data, maybe a logarithmic function
so that the  threshold grows slowly along with the base data size.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com