I think this is an interesting idea. My first thought is that it would be
nice if we didn't have to first compress the data to see whether we wanted
to store it compressed or not, but I don't think there is much we can do
about that. In any case, we aren't adding much work in the write path
compared to the potential savings in the read path, so that is probably
okay.
Do you think it is worth making this configurable? I don't think it is
outside the realm of possibility for some users to care more about disk
space than read performance. And maybe the threshold for this optimization
could differ based on the workload.
extern void toast_tuple_externalize(ToastTupleContext *ttc, int attribute,
int options);
+extern void toast_tuple_opt_externalize(ToastTupleContext *ttc, int attribute,
+ int options, Datum old_toast_value,
+ ToastAttrInfo *old_toast_attr);
Could we bake this into toast_tuple_externalize() so that all existing
callers benefit from this optimization? Is there a reason to export both
functions? Perhaps toast_tuple_externalize() should be renamed and made
static, and then this new function could be called
toast_tuple_externalize() (which would be a wrapper around the internal
function).
+ /* Sanity check: if data is not compressed then we can proceed as usual. */
+ if (!VARATT_IS_COMPRESSED(DatumGetPointer(*value)))
+ toast_tuple_externalize(ttc, attribute, options);
With a --enable-cassert build, this line causes assertion failures in the
call to GetMemoryChunkContext() from pfree(). 'make check' is enough to
reproduce it. Specifically, it fails the following assertion:
AssertArg(MemoryContextIsValid(context));
I didn't see an existing commitfest entry for this patch. I'd encourage
you to add one in the July commitfest:
https://commitfest.postgresql.org/38/
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com