Thread: checking my understanding of TupleDesc
From looking around the code, I've made these tentative observations about TupleDescs: 1. If the TupleDesc was obtained straight from the relcache for some relation, then all of its attributes should have nonzero attrelid identifying that relation, but in (every? nearly every?) other case, the attributes found in a TupleDesc will have a dummy attrelid of zero. 2. The attributes in a TupleDesc will (always?) have consecutive attnum corresponding to their positions in the TupleDesc (and therefore redundant). A query, say, that projects out a subset of columns from a relation will not have a result TupleDesc with attributes still bearing their original attrelid and attnum; they'll have attrelid zero and consecutive renumbered attnum. Something like SendRowDescriptionCols_3 that wants the original table and attnum has to reconstruct them from the targetlist if available, Have I mistaken any of that? Thanks, -Chap
On 09/29/19 20:13, Chapman Flack wrote: > From looking around the code, I've made these tentative observations > about TupleDescs: > > 1. If the TupleDesc was obtained straight from the relcache for some > relation, then all of its attributes should have nonzero attrelid > identifying that relation, but in (every? nearly every?) other case, > the attributes found in a TupleDesc will have a dummy attrelid of zero. > > 2. The attributes in a TupleDesc will (always?) have consecutive attnum > corresponding to their positions in the TupleDesc (and therefore > redundant). A query, say, that projects out a subset of columns > from a relation will not have a result TupleDesc with attributes > still bearing their original attrelid and attnum; they'll have > attrelid zero and consecutive renumbered attnum. > > Something like SendRowDescriptionCols_3 that wants the original table > and attnum has to reconstruct them from the targetlist if available, > > Have I mistaken any of that? And one more: 3. One could encounter a TupleDesc with one or more 'attisdropped' attributes, which do have their original attnums (corresponding to their positions in the TupleDesc and therefore redundant), so the attnums of nondropped attributes may be discontiguous. In building a corresponding tuple, any dropped attribute should have its null flag set. Is it simple to say under what circumstances a TupleDesc possibly with dropped members could be encountered, and under what other circumstances one would only encounter 'cleaned up' TupleDescs with no dropped attributes, and contiguous numbers for the real ones? Regards, -Chap
Chapman Flack <chap@anastigmatix.net> writes: > On 09/29/19 20:13, Chapman Flack wrote: >> From looking around the code, I've made these tentative observations >> about TupleDescs: >> >> 1. If the TupleDesc was obtained straight from the relcache for some >> relation, then all of its attributes should have nonzero attrelid >> identifying that relation, but in (every? nearly every?) other case, >> the attributes found in a TupleDesc will have a dummy attrelid of zero. I'm not sure about every vs. nearly every, but otherwise this seems accurate. Generally attrelid is meaningful in a pg_attribute catalog entry, but not in TupleDescs in memory. It appears valid in relcache entry tupdescs only because they are built straight from pg_attribute. >> 2. The attributes in a TupleDesc will (always?) have consecutive attnum >> corresponding to their positions in the TupleDesc (and therefore >> redundant). Correct. > And one more: > 3. One could encounter a TupleDesc with one or more 'attisdropped' > attributes, which do have their original attnums (corresponding > to their positions in the TupleDesc and therefore redundant), > so the attnums of nondropped attributes may be discontiguous. Right. > Is it simple to say under what circumstances a TupleDesc possibly > with dropped members could be encountered, Any tupdesc that's describing the rowtype of a table with dropped columns would look like that. > and under what other > circumstances one would only encounter 'cleaned up' TupleDescs with > no dropped attributes, and contiguous numbers for the real ones? I don't believe we ever include dropped columns in a projection result, so generally speaking, the output of a query plan node wouldn't have them. There's a semi-exception, which is that the planner might decide that we can skip a projection step for the output of a table scan node, in which case dropped columns would be included in its output. But that would only be true if there are upper plan nodes that are doing some projections of their own. The final query output will definitely not have them. regards, tom lane
Hi, On 2019-11-12 17:39:20 -0500, Tom Lane wrote: > > and under what other > > circumstances one would only encounter 'cleaned up' TupleDescs with > > no dropped attributes, and contiguous numbers for the real ones? > > I don't believe we ever include dropped columns in a projection result, > so generally speaking, the output of a query plan node wouldn't have them. > > There's a semi-exception, which is that the planner might decide that we > can skip a projection step for the output of a table scan node, in which > case dropped columns would be included in its output. But that would only > be true if there are upper plan nodes that are doing some projections of > their own. The final query output will definitely not have them. I *think* we don't even do that, because build_physical_tlist() bails out if there's a dropped (or missing) column. Or are you thinking of something else? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2019-11-12 17:39:20 -0500, Tom Lane wrote: >> There's a semi-exception, which is that the planner might decide that we >> can skip a projection step for the output of a table scan node, in which >> case dropped columns would be included in its output. But that would only >> be true if there are upper plan nodes that are doing some projections of >> their own. The final query output will definitely not have them. > I *think* we don't even do that, because build_physical_tlist() bails > out if there's a dropped (or missing) column. Ah, right. Probably because we need to insist on every column of an execution-time tupdesc having a valid atttypid ... although I wonder, is that really necessary? regards, tom lane
Hi, On 2019-11-12 18:20:56 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2019-11-12 17:39:20 -0500, Tom Lane wrote: > >> There's a semi-exception, which is that the planner might decide that we > >> can skip a projection step for the output of a table scan node, in which > >> case dropped columns would be included in its output. But that would only > >> be true if there are upper plan nodes that are doing some projections of > >> their own. The final query output will definitely not have them. > > > I *think* we don't even do that, because build_physical_tlist() bails > > out if there's a dropped (or missing) column. > > Ah, right. Probably because we need to insist on every column of an > execution-time tupdesc having a valid atttypid ... although I wonder, > is that really necessary? Yea, the stated reasoning is ExecTypeFromTL(): * * Exception: if there are any dropped or missing columns, we punt and return * NIL. Ideally we would like to handle these cases too. However this * creates problems for ExecTypeFromTL, which may be asked to build a tupdesc * for a tlist that includes vars of no-longer-existent types. In theory we * could dig out the required info from the pg_attribute entries of the * relation, but that data is not readily available to ExecTypeFromTL. * For now, we don't apply the physical-tlist optimization when there are * dropped cols. I think the main problem is that we don't even have a convenient way to identify that a targetlist expression is actually a dropped column, and treat that differently. If we were to expand physical tlists to cover dropped and missing columns, we'd need to be able to add error checks to at least ExecInitExprRec, and to printtup_prepare_info(). I wonder if we could get away with making build_physical_tlist() returning a TargetEntry for a Const instead of a Var for the dropped columns? That'd contain enough information for tuple deforming to work on higher query levels? Or perhaps we ought to invent a DroppedVar node, that includes the type information? That'd make it trivial to error out when such an expression is actually evaluated, and allow to detect such columns. We already put Const nodes in some places like that IIRC... Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2019-11-12 18:20:56 -0500, Tom Lane wrote: >> Ah, right. Probably because we need to insist on every column of an >> execution-time tupdesc having a valid atttypid ... although I wonder, >> is that really necessary? > Yea, the stated reasoning is ExecTypeFromTL(): > [ ExecTypeFromTL needs to see subexpressions with valid data types ] > I wonder if we could get away with making build_physical_tlist() > returning a TargetEntry for a Const instead of a Var for the dropped > columns? That'd contain enough information for tuple deforming to work > on higher query levels? Or perhaps we ought to invent a DroppedVar > node, that includes the type information? That'd make it trivial to > error out when such an expression is actually evaluated, and allow to > detect such columns. We already put Const nodes in some places like > that IIRC... Yeah, a DroppedVar thing might not be a bad idea, it could substitute for the dummy null constants we currently use. Note that an interesting property of such a node is that it doesn't actually *have* a type. A dropped column might be of a type that's been dropped too (and, if memory serves, we reset the column's atttypid to zero anyway). What we'd have to do is excavate atttyplen and attalign from the pg_attribute entry and store those in the DroppedVar node. Then, anything reconstructing a tupdesc would have to use those fields and avoid a pg_type lookup. I'm not sure whether the execution-time behavior of such a node ought to be "throw error" or just "return NULL". The precedent of the dummy constants suggests the latter. What would error out is anything that wants to extract an actual type OID from the expression. regards, tom lane