Thread: Rules for accessing tuple data in backend code

Rules for accessing tuple data in backend code

From
Peter Eisentraut
Date:
I'm sort of confused about the ways in which you can access tuple data you
get from heap scans or syscache lookups.  Perhaps this can be cleared up
and documented, because new contributors might like this information.
Here's the information and questions I have:

Tuples obtained from heap scans (heap_getnext, etc.) can always be
dissected with heap_getattr().

Tuples obtained from syscache lookups (SearchSysCache) can always be
dissected with SysCacheGetAttr().

What happens when I try heap_getattr() on a syscache tuple?

Tuples obtained from heap scans or syscache lookups may be dissected via
GETSTRUCT if and only if the attribute and all attributes prior to it are
fixed-length and non-nullable.

(Probably there should be cases about explicit index scans here, but I
haven't done those and they should be rare.)

The question I'm particularly struggling with is, when does TOASTing and
de-TOASTing happen?  And if it doesn't, what's the official way to do it?
I've found PG_DETOAST_DATUM and PG_DETOAST_DATUM_COPY.  Why would I want a
copy?  (How can detoasting happen without copying?)  And if I want a copy,
in what memory context does it live?  And can I just pfree() the copy if I
don't want it any longer?

-- 
Peter Eisentraut   peter_e@gmx.net



Re: Rules for accessing tuple data in backend code

From
John Gray
Date:
I can't help with most of the question, but as I've implemented new
TOAST access methods, I can answer this part:

On Mon, 2002-01-28 at 21:51, Peter Eisentraut wrote:
> 
> The question I'm particularly struggling with is, when does TOASTing and
> de-TOASTing happen?  And if it doesn't, what's the official way to do it?
> I've found PG_DETOAST_DATUM and PG_DETOAST_DATUM_COPY.  Why would I want a
> copy?  (How can detoasting happen without copying?)  And if I want a copy,
> in what memory context does it live?  And can I just pfree() the copy if I
> don't want it any longer?

I think there are two contexts for detoasting.

1) fmgr functions. The PG_GETARG macro fetches the argument Datum and
passes it through PG_DETOAST_DATUM (if the Datum is a TOASTable type).
Thus the Datum from PG_GETARG_ is always detoasted.

2) Other access. I believe that heap_getattr will return a Datum -which
for TOASTable types will be a varlena struct. This may contain either
the literal data for the value (compressed or not) or the TOAST-pointer
(toastrelid, toastvalueid). These various cases are distinguished by the
top two bits of the varlena length field.

In all cases other than the "uncompressed, inline" case, the value must
be passed through PG_DETOAST_DATUM to guarantee a "standard" varlena
i.e. a value that is detoasted and stored in memory, can be accessed
directly from C etc. 

However, the pointer returned by PG_DETOAST_DATUM might be *either* a
pointer to the original varlena struct, or to a decompressed value in
newly palloc'ed space. Thus the need for PG_DETOAST_DATUM_COPY, which
makes a copy of an uncompressed varlena, so that you can treat all the
cases in the same way. I believe that the detoasted datums from the
_COPY macros are ordinary things allocated by palloc in the current
memory context, so you can write to them and pfree() them if you wish.
The non-COPY variety might return a pointer to the inside of the tuple
data, which is not to be modified!

fmgr.h defines all the access methods, and also defines PG_FREE_IF_COPY,
which compares the pointer of the detoasted Datum to the original Datum
pointer and only calls pfree if they differ. 

Hope this helps.

Regards

John




Re: Rules for accessing tuple data in backend code

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Tuples obtained from heap scans (heap_getnext, etc.) can always be
> dissected with heap_getattr().

Check.  Index scans the same.

> Tuples obtained from syscache lookups (SearchSysCache) can always be
> dissected with SysCacheGetAttr().

Check.

> What happens when I try heap_getattr() on a syscache tuple?

Works fine; in fact, SysCacheGetAttr is just a convenience routine that
invokes heap_getattr.  The reason it's convenient is that you don't
necessarily have a tuple descriptor handy for the catalog that underlies
a particular syscache.  SysCacheGetAttr knows where to find a matching
descriptor.

> Tuples obtained from heap scans or syscache lookups may be dissected via
> GETSTRUCT if and only if the attribute and all attributes prior to it are
> fixed-length and non-nullable.

Right.  GETSTRUCT per se isn't very interesting; a more helpful way to
phrase the above is that "a C struct definition can be overlaid onto
the contents of a tuple, but it's only useful out to the last
fixed-length, non-null field.  We try to arrange the contents of system
catalogs so that that usefulness extends as far as possible."

> (Probably there should be cases about explicit index scans here, but I
> haven't done those and they should be rare.)

For these purposes index and heap scans are the same; either one
ultimately gives back a pointer to a tuple sitting in a disk buffer.

> The question I'm particularly struggling with is, when does TOASTing and
> de-TOASTing happen?

It doesn't, at the level of heap_getattr().  For a pass-by-reference
datatype (which includes all toastable types, a fortiori), heap_getattr
simply gives you back a Datum which is a pointer to the relevant place
in the tuple.  In general, you are not supposed to do anything with a
Datum except pass it around, unless you know the specific datatype of
the value and know how to operate on it.  For toastable datatypes, part
of "knowing how to operate on it" is to know to call pg_detoast_datum()
anytime you are handed a Datum that might possibly point at a toasted
value.

For the most part, datatype-specific operations are localized in
fmgr-callable functions, so it's possible to hide most of the knowledge
about detoasting in PG_GET_FOO macros for the affected datatypes.

> I've found PG_DETOAST_DATUM and PG_DETOAST_DATUM_COPY.  Why would I want a
> copy?  (How can detoasting happen without copying?)

PG_DETOAST_DATUM_COPY guarantees to give you a copy, even if the
original wasn't toasted.  This allows you to scribble on the input,
in case that happens to be a useful way of forming your result.
Without a forced copy, a routine for a pass-by-ref datatype must
NEVER, EVER scribble on its input ... because very possibly it'd
be scribbling on a valid tuple in a disk buffer, or a valid entry
in the syscache.

> And if I want a copy, in what memory context does it live?

It's just palloc'd, so it's whatever is CurrentMemoryContext.

> And can I just pfree() the copy if I don't want it any longer?

Yes.  In many scenarios you don't have to because CurrentMemoryContext
is short-lived, though.  There are a lot of pfree's in the system that
are really just wasted cycles.
        regards, tom lane