Thread: checking my understanding of TupleDesc

checking my understanding of TupleDesc

From
Chapman Flack
Date:
From looking around the code, I've made these tentative observations
about TupleDescs:

1. If the TupleDesc was obtained straight from the relcache for some
   relation, then all of its attributes should have nonzero attrelid
   identifying that relation, but in (every? nearly every?) other case,
   the attributes found in a TupleDesc will have a dummy attrelid of zero.

2. The attributes in a TupleDesc will (always?) have consecutive attnum
   corresponding to their positions in the TupleDesc (and therefore
   redundant). A query, say, that projects out a subset of columns
   from a relation will not have a result TupleDesc with attributes
   still bearing their original attrelid and attnum; they'll have
   attrelid zero and consecutive renumbered attnum.

   Something like SendRowDescriptionCols_3 that wants the original table
   and attnum has to reconstruct them from the targetlist if available,

Have I mistaken any of that?

Thanks,
-Chap



Re: checking my understanding of TupleDesc

From
Chapman Flack
Date:
On 09/29/19 20:13, Chapman Flack wrote:
> From looking around the code, I've made these tentative observations
> about TupleDescs:
> 
> 1. If the TupleDesc was obtained straight from the relcache for some
>    relation, then all of its attributes should have nonzero attrelid
>    identifying that relation, but in (every? nearly every?) other case,
>    the attributes found in a TupleDesc will have a dummy attrelid of zero.
> 
> 2. The attributes in a TupleDesc will (always?) have consecutive attnum
>    corresponding to their positions in the TupleDesc (and therefore
>    redundant). A query, say, that projects out a subset of columns
>    from a relation will not have a result TupleDesc with attributes
>    still bearing their original attrelid and attnum; they'll have
>    attrelid zero and consecutive renumbered attnum.
> 
>    Something like SendRowDescriptionCols_3 that wants the original table
>    and attnum has to reconstruct them from the targetlist if available,
> 
> Have I mistaken any of that?

And one more:

  3. One could encounter a TupleDesc with one or more 'attisdropped'
     attributes, which do have their original attnums (corresponding
     to their positions in the TupleDesc and therefore redundant),
     so the attnums of nondropped attributes may be discontiguous.
     In building a corresponding tuple, any dropped attribute should
     have its null flag set.

     Is it simple to say under what circumstances a TupleDesc possibly
     with dropped members could be encountered, and under what other
     circumstances one would only encounter 'cleaned up' TupleDescs with
     no dropped attributes, and contiguous numbers for the real ones?

Regards,
-Chap



Re: checking my understanding of TupleDesc

From
Tom Lane
Date:
Chapman Flack <chap@anastigmatix.net> writes:
> On 09/29/19 20:13, Chapman Flack wrote:
>> From looking around the code, I've made these tentative observations
>> about TupleDescs:
>> 
>> 1. If the TupleDesc was obtained straight from the relcache for some
>> relation, then all of its attributes should have nonzero attrelid
>> identifying that relation, but in (every? nearly every?) other case,
>> the attributes found in a TupleDesc will have a dummy attrelid of zero.

I'm not sure about every vs. nearly every, but otherwise this seems
accurate.  Generally attrelid is meaningful in a pg_attribute catalog
entry, but not in TupleDescs in memory.  It appears valid in relcache
entry tupdescs only because they are built straight from pg_attribute.

>> 2. The attributes in a TupleDesc will (always?) have consecutive attnum
>> corresponding to their positions in the TupleDesc (and therefore
>> redundant).

Correct.

> And one more:

>   3. One could encounter a TupleDesc with one or more 'attisdropped'
>      attributes, which do have their original attnums (corresponding
>      to their positions in the TupleDesc and therefore redundant),
>      so the attnums of nondropped attributes may be discontiguous.

Right.

>      Is it simple to say under what circumstances a TupleDesc possibly
>      with dropped members could be encountered,

Any tupdesc that's describing the rowtype of a table with dropped columns
would look like that.

>      and under what other
>      circumstances one would only encounter 'cleaned up' TupleDescs with
>      no dropped attributes, and contiguous numbers for the real ones?

I don't believe we ever include dropped columns in a projection result,
so generally speaking, the output of a query plan node wouldn't have them.

There's a semi-exception, which is that the planner might decide that we
can skip a projection step for the output of a table scan node, in which
case dropped columns would be included in its output.  But that would only
be true if there are upper plan nodes that are doing some projections of
their own.  The final query output will definitely not have them.

            regards, tom lane



Re: checking my understanding of TupleDesc

From
Andres Freund
Date:
Hi,

On 2019-11-12 17:39:20 -0500, Tom Lane wrote:
> >      and under what other
> >      circumstances one would only encounter 'cleaned up' TupleDescs with
> >      no dropped attributes, and contiguous numbers for the real ones?
> 
> I don't believe we ever include dropped columns in a projection result,
> so generally speaking, the output of a query plan node wouldn't have them.
> 
> There's a semi-exception, which is that the planner might decide that we
> can skip a projection step for the output of a table scan node, in which
> case dropped columns would be included in its output.  But that would only
> be true if there are upper plan nodes that are doing some projections of
> their own.  The final query output will definitely not have them.

I *think* we don't even do that, because build_physical_tlist() bails
out if there's a dropped (or missing) column. Or are you thinking of
something else?

Greetings,

Andres Freund



Re: checking my understanding of TupleDesc

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2019-11-12 17:39:20 -0500, Tom Lane wrote:
>> There's a semi-exception, which is that the planner might decide that we
>> can skip a projection step for the output of a table scan node, in which
>> case dropped columns would be included in its output.  But that would only
>> be true if there are upper plan nodes that are doing some projections of
>> their own.  The final query output will definitely not have them.

> I *think* we don't even do that, because build_physical_tlist() bails
> out if there's a dropped (or missing) column.

Ah, right.  Probably because we need to insist on every column of an
execution-time tupdesc having a valid atttypid ... although I wonder,
is that really necessary?

            regards, tom lane



Re: checking my understanding of TupleDesc

From
Andres Freund
Date:
Hi,

On 2019-11-12 18:20:56 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2019-11-12 17:39:20 -0500, Tom Lane wrote:
> >> There's a semi-exception, which is that the planner might decide that we
> >> can skip a projection step for the output of a table scan node, in which
> >> case dropped columns would be included in its output.  But that would only
> >> be true if there are upper plan nodes that are doing some projections of
> >> their own.  The final query output will definitely not have them.
> 
> > I *think* we don't even do that, because build_physical_tlist() bails
> > out if there's a dropped (or missing) column.
> 
> Ah, right.  Probably because we need to insist on every column of an
> execution-time tupdesc having a valid atttypid ... although I wonder,
> is that really necessary?

Yea, the stated reasoning is ExecTypeFromTL():
 *
 * Exception: if there are any dropped or missing columns, we punt and return
 * NIL.  Ideally we would like to handle these cases too.  However this
 * creates problems for ExecTypeFromTL, which may be asked to build a tupdesc
 * for a tlist that includes vars of no-longer-existent types.  In theory we
 * could dig out the required info from the pg_attribute entries of the
 * relation, but that data is not readily available to ExecTypeFromTL.
 * For now, we don't apply the physical-tlist optimization when there are
 * dropped cols.

I think the main problem is that we don't even have a convenient way to
identify that a targetlist expression is actually a dropped column, and
treat that differently. If we were to expand physical tlists to cover
dropped and missing columns, we'd need to be able to add error checks to
at least ExecInitExprRec, and to printtup_prepare_info().

I wonder if we could get away with making build_physical_tlist()
returning a TargetEntry for a Const instead of a Var for the dropped
columns? That'd contain enough information for tuple deforming to work
on higher query levels?  Or perhaps we ought to invent a DroppedVar
node, that includes the type information? That'd make it trivial to
error out when such an expression is actually evaluated, and allow to
detect such columns.  We already put Const nodes in some places like
that IIRC...

Greetings,

Andres Freund



Re: checking my understanding of TupleDesc

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2019-11-12 18:20:56 -0500, Tom Lane wrote:
>> Ah, right.  Probably because we need to insist on every column of an
>> execution-time tupdesc having a valid atttypid ... although I wonder,
>> is that really necessary?

> Yea, the stated reasoning is ExecTypeFromTL():
> [ ExecTypeFromTL needs to see subexpressions with valid data types ]

> I wonder if we could get away with making build_physical_tlist()
> returning a TargetEntry for a Const instead of a Var for the dropped
> columns? That'd contain enough information for tuple deforming to work
> on higher query levels?  Or perhaps we ought to invent a DroppedVar
> node, that includes the type information? That'd make it trivial to
> error out when such an expression is actually evaluated, and allow to
> detect such columns.  We already put Const nodes in some places like
> that IIRC...

Yeah, a DroppedVar thing might not be a bad idea, it could substitute
for the dummy null constants we currently use.  Note that an interesting
property of such a node is that it doesn't actually *have* a type.
A dropped column might be of a type that's been dropped too (and,
if memory serves, we reset the column's atttypid to zero anyway).
What we'd have to do is excavate atttyplen and attalign from the
pg_attribute entry and store those in the DroppedVar node.  Then,
anything reconstructing a tupdesc would have to use those fields
and avoid a pg_type lookup.

I'm not sure whether the execution-time behavior of such a node
ought to be "throw error" or just "return NULL".  The precedent
of the dummy constants suggests the latter.  What would error out
is anything that wants to extract an actual type OID from the
expression.

            regards, tom lane