Imagine a function that was going to take a table of, say, images, and
so something to them over and over, like:
SELECT pixel_count(img), clr_name, img_name
FROM images img
CROSS JOIN colors clr
When you run this function, you find that a great amount of time is
being spend in the decompression/detoasting routines, so you think: I
have a nested loop here, driven on the 'img' side, if I can avoid
re-loading the big image object over and over I can make things
faster.
Getting the datum value is really fast, so I can have a cache that
keeps the latest detoasted object around, and update it when the datum
changes, and store the cache information in the parent context. Like
so:
struct {
Datum d;
bytea *ba;
} DatumCache;
PG_FUNCTION_INFO_V1(pixel_count);
Datum pixel_count(PG_FUNCTION_ARGS)
{
Datum d = PG_GETARG_DATUM(0);
DatumCache *dcache = fcinfo->flinfo->fn_extra;
bytea *ba;
if (!dcache)
{
dcache = MemoryContextAllocZero(fcinfo->flinfo->fn_mcxt,
sizeof(DatumCache));
fcinfo->flinfo->fn_extra = dcache;
}
if (dcache->d != d)
{
if (dcache->ba) pfree(dcache->ba);
MemoryContext old_context =
MemoryContextSwitchTo(fcinfo->flinfo->fn_mcxt);
dcache->ba = PG_GETARG_BYTEA_P_COPY(0);
MemoryContextSwitchTo(old_context);
}
ba = dcache->ba;
/* now do things with ba here */
}
Now, notwithstanding any concerns about the particularities of my
example (I've found order-of-magnitude improvements on PostGIS
workloads avoiding the detoasting overhead this way) is my core
assumption correct: within the context of a single SQL statement, will
the Datum values for a particular object remain constant?
They *seem* to, in the examples I'm running. But do they always?