Re: Optimize external TOAST storage - Mailing list pgsql-hackers

From davinder singh
Subject Re: Optimize external TOAST storage
Date
Msg-id CAHzhFSE8Vx7dUC3P9NYMkZi0i9y-hmBtOS_ScxHvhOCxNvJe8Q@mail.gmail.com
Whole thread Raw
In response to Re: Optimize external TOAST storage  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: Optimize external TOAST storage  (Nathan Bossart <nathandbossart@gmail.com>)
List pgsql-hackers

Thanks, Nathan, for the review comments. Please find the updated patch.
On Sun, Mar 13, 2022 at 3:43 AM Nathan Bossart <nathandbossart@gmail.com> wrote: 
Do you think it is worth making this configurable?  I don't think it is
outside the realm of possibility for some users to care more about disk
space than read performance.  And maybe the threshold for this optimization
could differ based on the workload.

I think here we can break it into two parts.
The first part if the user cares about the disk more than reading performance, disable this?
That is a good idea, we can try this, lets see what others say.

Regarding the 2nd part of configuring the threshold, Based on our
experiments, we have fixed it for the attributes with size > 2 * chunk_size.
The default chunk_size is 2KB and the page size is 8KB. While toasting each
attribute is divided into chunks, and each page can hold a max of 4 such chunks.
We only need to think about the space used by the last chunk of the attribute.
This means with each value optimization, it might use extra space in the range
(0B,2KB]. I think this extra space is independent of attribute size. So we don't
need to worry about configuring this threshold. Let me know if I missed something
here.


 extern void toast_tuple_externalize(ToastTupleContext *ttc, int attribute,
                                     int options);
+extern void toast_tuple_opt_externalize(ToastTupleContext *ttc, int attribute,
+                                        int options, Datum old_toast_value,
+                                        ToastAttrInfo *old_toast_attr);

Could we bake this into toast_tuple_externalize() so that all existing
callers benefit from this optimization?  Is there a reason to export both
functions?  Perhaps toast_tuple_externalize() should be renamed and made
static, and then this new function could be called
toast_tuple_externalize() (which would be a wrapper around the internal
function).
This function is used only in heap_toast_insert_or_update(), all existing callers are
using new function only. As you suggested, I have renamed the new function as
wrapper function and also only exporting the new function.
 

+    /* Sanity check: if data is not compressed then we can proceed as usual. */
+    if (!VARATT_IS_COMPRESSED(DatumGetPointer(*value)))
+        toast_tuple_externalize(ttc, attribute, options);

With a --enable-cassert build, this line causes assertion failures in the
call to GetMemoryChunkContext() from pfree().  'make check' is enough to
reproduce it.  Specifically, it fails the following assertion:

        AssertArg(MemoryContextIsValid(context));

Thanks for pointing it out, this failure started because I was not handling the
case when the data is already compressed even before toasting. Following
check verifies if the data is compressed or not but that is not enough because
we can't optimize the toasting if we didn't get the data in the
uncompressed form in the first place from the source.
+    /* If data is not compressed then we can proceed as usual. */
+    if (!VARATT_IS_COMPRESSED(DatumGetPointer(*value)))

v1 patch didn't have this problem because it was verifying whether we have
compressed data in this toasting round or not. I have changed back to the
earlier version.
+    /* If data is not compressed then we can proceed as usual. */
+    if (*value == orig_toast_value)



--
Regards,
Davinder
Attachment

pgsql-hackers by date:

Previous
From: Mark Dilger
Date:
Subject: Re: pg14 psql broke \d datname.nspname.relname
Next
From: Greg Stark
Date:
Subject: Re: Tablesync early exit