Re: Reducing the chunk header sizes on all memory context types - Mailing list pgsql-hackers

From David Rowley
Subject Re: Reducing the chunk header sizes on all memory context types
Date
Msg-id CAApHDvrsqN_qM7JaiKgR3Sf5YmnYvgf6kcFUp=w1Ot06nZAfUw@mail.gmail.com
Whole thread Raw
In response to Re: Reducing the chunk header sizes on all memory context types  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Reducing the chunk header sizes on all memory context types
List pgsql-hackers
On Fri, 9 Sept 2022 at 11:33, David Rowley <dgrowleyml@gmail.com> wrote:
> I really think the Assert only form of MemoryContextContains() is the
> best move, and if it's doing Assert only, then we can do the
> loop-over-the-blocks idea as you described and I drafted in [1].
>
> If the need comes up that we're certain we always have a pointer to
> some allocated chunk, but need to know if it's in some memory context,
> then the proper form of expressing that, I think, should be:
>
> if (GetMemoryChunkContext(pointer) == somecontext)
>
> If we're worried about getting that wrong, we can beef up the
> MemoryChunk struct with a magic_number field in
> MEMORY_CONTEXT_CHECKING builds to ensure we catch any code which
> passes invalid pointers.

I've attached a patch series which is my proposal on what we should do
about MemoryContextContains. In summary, this basically changes
MemoryContextContains() so it's only available in
MEMORY_CONTEXT_CHECKING builds and removes 4 current usages of the
function.

0001: Makes MemoryContextContains work again with the
loop-over-the-blocks method mentioned by Tom.
0002: Adds a new "chunk_magic" field to MemoryChunk.  My thoughts are
that it might be a good idea to do this so that we get Assert failures
if we use functions like GetMemoryChunkContext() on a pointer that's
not a MemoryChunk.
0003: This adjusts aggregate final functions and window functions so
that any byref Datum they return is allocated in CurrentMemoryContext
0004: Makes MemoryContextContains only available in
MEMORY_CONTEXT_CHECKING builds and adjusts our usages of the function
to use GetMemoryChunkContext() instead.

An alternative to 0004, would be more along the lines of what was
mentioned by Andres and just Assert that the returned value is in the
memory context that we expect. I don't think we need to do anything
special with aggregates that take an internal state.  I think the rule
is just as simple as;  all final functions and window functions must
return any byref values in CurrentMemoryContext.  For aggregates
without a finalfn, we can just datumCopy() the returned byref value.
There's no chance for those to be in CurrentMemoryContext anyway. The
return value must be in the aggregate state's context.  The attached
assert.patch shows that this holds true in make check after applying
each of the other patches.

I see that one of the drawbacks from not using MemoryContextContains()
is that window functions such as lead(), lag(), first_value(),
last_value() and nth_value() may now do the datumCopy() when it's not
needed. For example, with a window function call such as
lead(byref_col ), the expression evaluation code in
WinGetFuncArgInPartition() will just return the address in the
tuplestore tuple for "byref_col".  The datumCopy() is needed for that.
However, if the function call was lead(byref_col || 'something') then
we'd have ended up with a new allocation in CurrentMemoryContext to
concatenate the two values.  We'll now do a datumCopy() where we
previously wouldn't have. I don't really see any way around that
without doing some highly invasive surgery to the expression
evaluation code.

None of the attached patches are polished. I can do that once we agree
on the best way to fix the issue.

David

Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Transparent column encryption
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Error "initial slot snapshot too large" in create replication slot