Thread: Better support for whole-row operations and composite types
We have a number of issues revolving around the fact that composite types (row types) aren't first-class objects. I think it's past time to fix that. Here are some notes about doing it. I am not sure all these ideas are fully-baked ... comments appreciated. When represented as a Datum, the format of a row-type object needs to be something like this: * overall length: int4 (this makes the Datum a valid varlena item) * row type id: Oid (either a composite type id or RECORDOID) * row type typmod: int4 (see below for usage) -- pad if needed to MAXALIGN boundary * heap tuple representation, beginning with a HeapTupleHeaderData struct If we do it exactly as above then we will be wasting some space, because the xmin/xmax/cmax and ctid fields of HeapTupleHeaderData are of no use in a row that isn't actually a table member row. It is very tempting to overlay the length and rowtype fields with the HeapTupleHeaderData struct. This would save some code as well as space --- see discussion below. Only named composite types, not RECORD, will be allowed to be used as table column types. This ensures that any row object stored on disk will have a valid composite type ID embedded in it, so that the row structure can be retrieved when the row is read. However, we want to be able to support row objects in memory that are of transient record types (for example, the output of a function returning RECORD will have a record type determined by the query itself). I propose that we handle this case by setting the type id to RECORDOID and using the typmod to identify the particular record type --- the typmod will essentially be an index into a backend-local cache of record types. More detail below. We'll add "tdtypeid" and "tdtypmod" fields to TupleDesc structs. This will make it easy to set the embedded type information correctly when manufacturing a row datum using a TupleDesc. For TupleDescs associated with relations, tdtypeid is just the relation's row type OID, and tdtypmod is -1. For TupleDescs representing transient row types, we initially set tdtypeid to RECORDOID and tdtypmod to -1 (indicating a completely anonymous row type). If the row type actually needs to be identifiable then we establish a cache entry for it and set the typmod to an index for the cache entry. I think this will only need to happen when the query contains a function-returning-RECORD or a whole-row variable referencing what would otherwise be an anonymous row type, such as a JOIN result. Composite types, as well as the RECORD type, will be marked in pg_type as pass-by-ref, varlena (typlen -1), typalign 'd'. (We will use the maximum alignment always to avoid any dependency on types of the contained columns.) The present function call and return conventions involving TupleTableSlots will be replaced by simply passing and returning these row objects as pass-by-reference Datums. In the case of functions returning rowtypes, we'll continue to support the present ReturnSetInfo convention for returning a separate TupleDesc describing the result type --- but this will just be a crosscheck. We will be able to make generic I/O routines for composite types, comparable to those used now for arrays. Not sure what a convenient external format would look like. (Possibly use the same conventions as for a 1-D array?) We will need to make the convention that the type OID of a composite type is passed to the input routine, in the same way that an array input routine gets the typelem OID; else the input routine won't know what to do. We could also think about allowing functions that are declared as accepting RECORD (ie, polymorphic-across-row-types functions). They would use the same methods already used by polymorphic functions to find out the true types of their inputs. (Might be best to invent a separate pseudotype, say ANYRECORD, rather than overloading RECORD for this purpose.) The recently developed SRF API is a bit unfortunate since it exposes the assumption that a TupleTableSlot must be involved in returning a tuple. If we don't overlay the Datum header with HeapTupleHeader then I think we have to make TupleGetDatum copy the passed tuple and insert the row type info from the slot's tupledesc, which'd be pretty inefficient because it means making an extra copy of the row data. But if we do overlay the header fields, then I think we can set up backwards-compatibility definitions in which the slot is simply ignored. Specifically: TupleDescGetSlot: no-op, returns NULLTupleGetDatum: ignore slot, return tuple t_data pointer as datum This will work because heap_formtuple and BuildTupleFromCStrings can return a HeapTuple whose t_data part is already a valid row Datum, simply by setting the appropriate length and type fields in it. (If the tuple is ever stored to disk as a regular table row, these fields will be overwritten with xmin/cmin info at that time.) To convert a row Datum into something that can be passed to heap_getattr, one could use a local variable of type HeapTupleData and set its t_data field to the datum's pointer value. t_len is copied from the datum contents, while the other fields of HeapTupleData can just be set to zeroes. ExecEvalVar for a whole-row reference will need to copy the scan tuple so that it can insert the correct length and tuple type fields. (We cannot scribble on the tuple as it sits in the disk buffer, of course.) Fortunately this shouldn't be a major memory leak anymore since the copy can be made in the current short-lived memory context. Handling anonymous RECORD types ------------------------------- I envision expanding typcache.c to be able to store TupleDesc structures for composite and record types. In the case of regular composite types this is not especially difficult. For record types, we are essentially trying to make a backend-local mapping from typmod values to TupleDescs. There are a couple of interesting points: * We have to be able to re-use an already-existing cache entry if it matches a requested TupleDesc. This avoids indefinite growth of the type cache over many queries. There could still be issues with memory leakage if a single backend session uses a huge number of distinct record types over its lifetime, but that doesn't seem likely to be an issue in practice. (We could avoid this problem by recycling no-longer-needed cache entries, but what with plan caching I'm not sure there's any pleasant way to do that. For the moment I intend that cache entries for record types will live for the life of the backend.) * Since record typmod values are backend-local, they aren't meaningful in query structures stored on disk. When a stored rule is read in, we'll need to be able to replace any embedded typmod values with correct assignments for the current backend. Safely storing composite types on disk -------------------------------------- If a composite row value contains any out-of-line TOAST references, we'd have to expand those references before we could safely store the value on disk. This can be handled by the same tuptoaster.c routines that are already concerned with replacing unsafe references. ALTER TABLE issues ------------------ If an ALTER TABLE command does something that requires examining or changing every row of a table, it would presumably have to do the same to all entries in any composite-type column of the table's rowtype. To avoid surprises and interesting debates about who has permissions to do this, it might be wise to restrict on-disk composite columns to be only of standalone composite types (ie, those made with CREATE TYPE AS). This restriction would also avoid debates about whether table constraints apply to composite-type columns. Notes ----- While doing this, we should once and for all rip out the last vestiges of the "attisset" feature. Add an Assert to ExecEvalVar that checks that whole-row vars (and, I guess, any system column as well) are fetched from a scan tuple, never the inner or outer side of a join. If they've not been converted into ordinary field references in a join, it's too late. The current API for TypeGetTupleDesc is somewhat bogus --- I don't think the "column alias" option is really appropriate, and it is lacking a typmod argument so it can't be used with record types. We shall have to deprecate it in favor of a new routine. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > We have a number of issues revolving around the fact that composite types > (row types) aren't first-class objects. I think it's past time to fix > that. ... > Only named composite types, not RECORD, will be allowed to be used as > table column types. If I understand what you're talking about, you would be allowed to CREATE TYPE a composite type, like say, "address" and then use that as a datatype all over your database? And then if you find "address" needs a new field you can add it to the type and automatically have it added all over your database to any table column using that type? Speaking as a user, that would be **very** nice. I've often found myself wishing for just such a feature. It would simplify data model maintenance a whole heck of a lot. How will client programs see the data if i do a "select *"? In my ideal world it would be shipped over in a binary representation that a driver would translate to a perl hash / php array / whatever. But maybe it would be simpler to just ship them over the subcolumns with names like "shipping.line_1" and "shipping.country". -- greg
Greg Stark <gsstark@mit.edu> writes: > If I understand what you're talking about, you would be allowed to > CREATE TYPE a composite type, like say, "address" and then use that as > a datatype all over your database? And then if you find "address" > needs a new field you can add it to the type and automatically have it > added all over your database to any table column using that type? I believe that would work, though you might have some issues with cached plans. > How will client programs see the data if i do a "select *"? TBD. regards, tom lane
Tom Lane wrote: >We have a number of issues revolving around the fact that composite types >(row types) aren't first-class objects. I think it's past time to fix >that. Here are some notes about doing it. I am not sure all these ideas >are fully-baked ... comments appreciated. > > > [snip] >Only named composite types, not RECORD, will be allowed to be used as >table column types. > [snip] Interesting. I'm slightly curious to know if there's an external driver for this. Will this apply recursively (an a has a b which has an array of c's)? Are there indexing implications? Could one index on a subfield? cheers andrew
Tom, > We have a number of issues revolving around the fact that composite types > (row types) aren't first-class objects. I think it's past time to fix > that. Here are some notes about doing it. I am not sure all these ideas > are fully-baked ... comments appreciated. I'll want to add to the documentation on composite types, then. We'll need a stern warning to users not to abuse them. Easily done, I think. Composite types are frequently abused by OO and Windows programmers to break the relational model. I used to be an MSDN member (thank you, I've recovered) and frequently ran into, on the mailing list, users getting themselves into some unresolvable mess becuase they'd used composite types in SQL server to combine several rows ... or even effectively an entire child table ... into one field. Othewise, looks good to me. I don't think I'm qualified to second-guess you on the implementation. -- -Josh BerkusAglio Database SolutionsSan Francisco
Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >> Only named composite types, not RECORD, will be allowed to be used as >> table column types. > Interesting. I'm slightly curious to know if there's an external driver > for this. There's noplace to store a permanent record of an anonymous rowtype's structure. To do otherwise would amount to executing an implicit CREATE TYPE AS for the user, so we might as well just say up front that you have to create the type. > Will this apply recursively (an a has a b which has an array of c's)? Yup. > Are there indexing implications? Could one index on a subfield? Using an expression index, sure. I don't think we need to support it as a "primitive" index type. regards, tom lane
Tom Lane wrote: > We have a number of issues revolving around the fact that composite > types (row types) aren't first-class objects. I think it's past time > to fix that. Here are some notes about doing it. I am not sure all > these ideas are fully-baked ... comments appreciated. [Sorry for the delay in responding] Nice work, and in general it makes sense to me. A few comments below. > We will be able to make generic I/O routines for composite types, > comparable to those used now for arrays. Not sure what a convenient > external format would look like. (Possibly use the same conventions > as for a 1-D array?) So you mean like an array, but with possibly mixed datatypes? '{1 , "abc def", 2.3}' Seems to make sense. Another option might be to use the ROW keyword, something like: ROW[1 , 'abc', 2.3] > We could also think about allowing functions that are declared as > accepting RECORD (ie, polymorphic-across-row-types functions). They > would use the same methods already used by polymorphic functions to > find out the true types of their inputs. (Might be best to invent a > separate pseudotype, say ANYRECORD, rather than overloading RECORD > for this purpose.) Check. I really like this idea. > TupleDescGetSlot: no-op, returns NULL TupleGetDatum: ignore slot, > return tuple t_data pointer as datum > > This will work because heap_formtuple and BuildTupleFromCStrings can > return a HeapTuple whose t_data part is already a valid row Datum, > simply by setting the appropriate length and type fields in it. (If > the tuple is ever stored to disk as a regular table row, these fields > will be overwritten with xmin/cmin info at that time.) Is this the way you did things in your recent commit? > To convert a row Datum into something that can be passed to > heap_getattr, one could use a local variable of type HeapTupleData > and set its t_data field to the datum's pointer value. t_len is > copied from the datum contents, while the other fields of > HeapTupleData can just be set to zeroes. I think I understand this, but an example would help. > * We have to be able to re-use an already-existing cache entry if it > matches a requested TupleDesc. For anonymous record types, how will that lookup be done efficiently? Can the hash key be an array of attribute oids? > If an ALTER TABLE command does something that requires examining or > changing every row of a table, it would presumably have to do the > same to all entries in any composite-type column of the table's > rowtype. To avoid surprises and interesting debates about who has > permissions to do this, it might be wise to restrict on-disk > composite columns to be only of standalone composite types (ie, those > made with CREATE TYPE AS). This restriction would also avoid debates > about whether table constraints apply to composite-type columns. I agree. As an aside, it would be quite useful to have support for arrays of tuples. Any idea on how to do that without needing to define an explicit array type for each tuple type? Joe
Joe Conway <mail@joeconway.com> writes: > Tom Lane wrote: >> We will be able to make generic I/O routines for composite types, >> comparable to those used now for arrays. Not sure what a convenient >> external format would look like. (Possibly use the same conventions >> as for a 1-D array?) > So you mean like an array, but with possibly mixed datatypes? > '{1 , "abc def", 2.3}' > Seems to make sense. The unresolved question in my mind is how to represent NULL elements. However, we have to solve that sooner or later for arrays too. Any thoughts? > Another option might be to use the ROW keyword, something like: > ROW[1 , 'abc', 2.3] This is a separate issue, just as the ARRAY[] constructor has different uses from the array I/O representation. I do want some kind of runtime constructor, but ROW[...] doesn't get the job done because it doesn't provide any place to specify the rowtype name. Maybe we could combine ROW[...] with some sort of cast notation? ROW[1 , 'abc', 2.3] :: composite_type_nameCAST(ROW[1 , 'abc', 2.3] AS composite_type_name) Does SQL99 provide any guidance here? >> TupleDescGetSlot: no-op, returns NULL TupleGetDatum: ignore slot, >> return tuple t_data pointer as datum >> >> This will work because heap_formtuple and BuildTupleFromCStrings can >> return a HeapTuple whose t_data part is already a valid row Datum, >> simply by setting the appropriate length and type fields in it. (If >> the tuple is ever stored to disk as a regular table row, these fields >> will be overwritten with xmin/cmin info at that time.) > Is this the way you did things in your recent commit? Almost. I ended up keeping TupleDescGetSlot as a live function, but its true purpose is only to ensure that the tupledesc gets registered with the type cache (see BlessTupleDesc() in CVS tip). The slot per se never gets used. I believe that CVS tip is source-code-compatible with existing SRFs, even though I adjusted all the ones in the distribution to stop using the TupleTableSlot stuff. The main point though is that row Datums now contain sufficient info embedded in them to allow runtime type lookup the same as we do for arrays. >> To convert a row Datum into something that can be passed to >> heap_getattr, one could use a local variable of type HeapTupleData >> and set its t_data field to the datum's pointer value. t_len is >> copied from the datum contents, while the other fields of >> HeapTupleData can just be set to zeroes. > I think I understand this, but an example would help. There are several in the PL sources now, for instance plpgsql does this with an incoming rowtype argument: if (!fcinfo->argnull[i]) { HeapTupleHeader td; Oid tupType; int32 tupTypmod; TupleDesc tupdesc; HeapTupleData tmptup; td = DatumGetHeapTupleHeader(fcinfo->arg[i]); /* Extract rowtype info and find a tupdesc */ tupType = HeapTupleHeaderGetTypeId(td); tupTypmod= HeapTupleHeaderGetTypMod(td); tupdesc = lookup_rowtype_tupdesc(tupType, tupTypmod); /* Build a temporary HeapTuple control structure */ tmptup.t_len = HeapTupleHeaderGetDatumLength(td); ItemPointerSetInvalid(&(tmptup.t_self)); tmptup.t_tableOid = InvalidOid; tmptup.t_data = td; exec_move_row(&estate, NULL, row, &tmptup, tupdesc); } This is okay because the HeapTupleData is not needed after the call to exec_move_row. >> * We have to be able to re-use an already-existing cache entry if it >> matches a requested TupleDesc. > For anonymous record types, how will that lookup be done efficiently? > Can the hash key be an array of attribute oids? Right, that's the way I did it. See src/backend/utils/cache/typcache.c > As an aside, it would be quite useful to have support for arrays of > tuples. Any idea on how to do that without needing to define an explicit > array type for each tuple type? Hmm, messy ... I wonder now whether we still really need a separate pg_type entry for every array type. The original motivation for doing that has been at least partly subsumed by storing element type OIDs inside the arrays themselves. I wonder if we could go over to a scheme where, say, atttypid is the base type ID and attndims being nonzero is what you check to find out it's really an array of atttypid. Not sure how we could map that idea into function and expression args/results, though. Plan B would be to go ahead and create array types. Not sure I would want to do this for table rowtypes, but if we did it only for CREATE TYPE AS then it doesn't sound like an unreasonable amount of overhead. regards, tom lane
Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: >>So you mean like an array, but with possibly mixed datatypes? >>'{1 , "abc def", 2.3}' >>Seems to make sense. > > The unresolved question in my mind is how to represent NULL elements. > However, we have to solve that sooner or later for arrays too. Any > thoughts? Good point. What's really ugly is that the external representation of string types differs depending on whether quotes are needed or not. If strings were *always* surrounded by quotes, we could just use the word NULL, without the quotes. >>Another option might be to use the ROW keyword, something like: >>ROW[1 , 'abc', 2.3] > > > This is a separate issue, just as the ARRAY[] constructor has different > uses from the array I/O representation. I do want some kind of runtime > constructor, but ROW[...] doesn't get the job done because it doesn't > provide any place to specify the rowtype name. Maybe we could combine > ROW[...] with some sort of cast notation? > > ROW[1 , 'abc', 2.3] :: composite_type_name > CAST(ROW[1 , 'abc', 2.3] AS composite_type_name) > > Does SQL99 provide any guidance here? The latter seems to agree with 6.12 (<cast specification>) of SQL2003. I'd think we'd want the former supported anyway as an extension to standard. > Almost. I ended up keeping TupleDescGetSlot as a live function, but its > true purpose is only to ensure that the tupledesc gets registered with > the type cache (see BlessTupleDesc() in CVS tip). The slot per se never > gets used. I believe that CVS tip is source-code-compatible with > existing SRFs, even though I adjusted all the ones in the distribution > to stop using the TupleTableSlot stuff. Almost compatible. I found that, to my surprise, PL/R compiles with no changes after your commit. However it no segfaults (as I expected) on composite type arguments. Should be easy to fix though (I think, really haven't looked at it hard yet). > The main point though is that row Datums now contain sufficient info > embedded in them to allow runtime type lookup the same as we do for arrays. Sounds good to me. > There are several in the PL sources now, for instance plpgsql does this > with an incoming rowtype argument: Perfect -- thanks. >>As an aside, it would be quite useful to have support for arrays of >>tuples. Any idea on how to do that without needing to define an explicit >>array type for each tuple type? > > Hmm, messy ... > > I wonder now whether we still really need a separate pg_type entry for > every array type. The original motivation for doing that has been at > least partly subsumed by storing element type OIDs inside the arrays > themselves. I wonder if we could go over to a scheme where, say, > atttypid is the base type ID and attndims being nonzero is what you > check to find out it's really an array of atttypid. Not sure how we > could map that idea into function and expression args/results, though. Hmmm. I had thought maybe we could use a single datatype (anyarray?) with in/out functions that would need to do the right thing based on the element type. This would also allow, for example, arrays-of-arrays, which is the way that SQL99/2003 seem to allow for multidimensional arrays. > Plan B would be to go ahead and create array types. Not sure I would > want to do this for table rowtypes, but if we did it only for CREATE > TYPE AS then it doesn't sound like an unreasonable amount of overhead. I was hoping we wouldn't need to do that. Joe
Joe Conway <mail@joeconway.com> writes: > Tom Lane wrote: >> ... I believe that CVS tip is source-code-compatible with >> existing SRFs, even though I adjusted all the ones in the distribution >> to stop using the TupleTableSlot stuff. > Almost compatible. I found that, to my surprise, PL/R compiles with no > changes after your commit. However it no segfaults (as I expected) on > composite type arguments. Should be easy to fix though (I think, really > haven't looked at it hard yet). Let me know what you find out --- if I missed a trick on compatibility, there's still plenty of time to fix it. >> ... I wonder if we could go over to a scheme where, say, >> atttypid is the base type ID and attndims being nonzero is what you >> check to find out it's really an array of atttypid. Not sure how we >> could map that idea into function and expression args/results, though. > Hmmm. I had thought maybe we could use a single datatype (anyarray?) > with in/out functions that would need to do the right thing based on the > element type. If we have just one datatype, how will the parser determine the type of a "foo[subscript]" expression? After thinking a bit, I don't see how to do that except by adding an out-of-line decoration to the underlying type, somewhat like we do for "setof" or atttypmod. This is doable as far as the backend itself is concerned, but the compatibility implications for clients and user-written extensions seem daunting :-( regards, tom lane
Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: >>Almost compatible. I found that, to my surprise, PL/R compiles with no >>changes after your commit. However it no segfaults (as I expected) on >>composite type arguments. Should be easy to fix though (I think, really >>haven't looked at it hard yet). > > Let me know what you find out --- if I missed a trick on compatibility, > there's still plenty of time to fix it. I still haven't had time to look closely, and well may have been doing something non-standard all along, but in any case this is the current failing code: else if (function->arg_is_rel[i]) { /* for tuple args, convert to a one row data.frame */ TupleTableSlot *slot =(TupleTableSlot *) arg[i]; HeapTuple tuples = slot->val; TupleDesc tupdesc = slot->ttc_tupleDescriptor; PROTECT(el = pg_tuple_get_r_frame(1, &tuples, tupdesc)); } The problem was (I think -- I'll check a little later) that slot->ttc_tupleDescriptor is now '\0'. >>Hmmm. I had thought maybe we could use a single datatype (anyarray?) >>with in/out functions that would need to do the right thing based on the >>element type. > > If we have just one datatype, how will the parser determine the type of > a "foo[subscript]" expression? After thinking a bit, I don't see how to > do that except by adding an out-of-line decoration to the underlying > type, somewhat like we do for "setof" or atttypmod. This is doable as > far as the backend itself is concerned, but the compatibility > implications for clients and user-written extensions seem daunting :-( I'll think-about/play-with this some more, hopefully this weekend. Thanks, Joe
Joe Conway <mail@joeconway.com> writes: > I still haven't had time to look closely, and well may have been doing > something non-standard all along, but in any case this is the current > failing code: > /* for tuple args, convert to a one row data.frame */ > TupleTableSlot *slot = (TupleTableSlot *) arg[i]; > HeapTuple tuples = slot->val; > TupleDesc tupdesc = slot->ttc_tupleDescriptor; Um. Well, the arg is not a TupleTableSlot * anymore, so this is guaranteed to fail. This isn't part of what I thought the documented SRF API was though. If you take the arg[i] value and pass it to GetAttributeByName or GetAttributeByNum it will work (with some compiler warnings) and AFAICS we never documented more than that. regards, tom lane
Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: >> /* for tuple args, convert to a one row data.frame */ >> TupleTableSlot *slot = (TupleTableSlot *) arg[i]; HeapTuple tuples >> = slot->val; TupleDesc tupdesc = slot->ttc_tupleDescriptor; > > Um. Well, the arg is not a TupleTableSlot * anymore, so this is > guaranteed to fail. This isn't part of what I thought the documented > SRF API was though. I'm sure you're correct. The SRF API was for user defined functions, not procedural languages anyway. I'll look at how the other procedural languages handle tuple arguments. > If you take the arg[i] value and pass it to GetAttributeByName or > GetAttributeByNum it will work (with some compiler warnings) and > AFAICS we never documented more than that. OK, thanks, Joe
Joe Conway <mail@joeconway.com> writes: > ... The SRF API was for user defined functions, not > procedural languages anyway. I'll look at how the other procedural > languages handle tuple arguments. It was a dozen-or-so-lines change in each of the PL languages AFAIR. You will probably also want to look at what you do to return tuple results. regards, tom lane
Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: >>... The SRF API was for user defined functions, not >>procedural languages anyway. I'll look at how the other procedural >>languages handle tuple arguments. > > It was a dozen-or-so-lines change in each of the PL languages AFAIR. > You will probably also want to look at what you do to return tuple > results. OK, thanks. Just for reference, what is arg[i] if it isn't a (TupleTableSlot *) anymore -- is it just a HeapTuple? Joe
Joe Conway <mail@joeconway.com> writes: > Just for reference, what is arg[i] if it isn't a (TupleTableSlot *) > anymore -- is it just a HeapTuple? No, it's a HeapTupleHeader pointer. You need to reconstruct a HeapTuple on top of that to work with heap_getattr and most other core backend routines. Also don't forget to ensure that you detoast the datum; this is not useful at the moment but will be important Real Soon Now. I added standard argument-fetch macros to fmgr.h to help with the detoasting bit. regards, tom lane
Tom Lane wrote: > No, it's a HeapTupleHeader pointer. You need to reconstruct a > HeapTuple on top of that to work with heap_getattr and most other > core backend routines. Thanks. For triggers, I was previously building up the arguments thus: slot = TupleDescGetSlot(tupdesc); slot->val = trigdata->tg_trigtuple; arg[7] = PointerGetDatum(slot); I suppose now I should do this instead? arg[7] = PointerGetDatum(trigdata->tg_trigtuple->t_data); > Also don't forget to ensure that you detoast the datum; this is not > useful at the moment but will be important Real Soon Now. I added > standard argument-fetch macros to fmgr.h to help with the detoasting > bit. OK. This is the net result: #ifdef PG_VERSION_75_COMPAT Oid tupType; int32 tupTypmod; TupleDesc tupdesc; HeapTuple tuple = palloc(sizeof(HeapTupleData)); HeapTupleHeader tuple_hdr = DatumGetHeapTupleHeader(arg[i]); tupType = HeapTupleHeaderGetTypeId(tuple_hdr); tupTypmod = HeapTupleHeaderGetTypMod(tuple_hdr); tupdesc = lookup_rowtype_tupdesc(tupType,tupTypmod); tuple->t_len = HeapTupleHeaderGetDatumLength(tuple_hdr); ItemPointerSetInvalid(&(tuple->t_self)); tuple->t_tableOid =InvalidOid; tuple->t_data = tuple_hdr; PROTECT(el = pg_tuple_get_r_frame(1, &tuple, tupdesc)); pfree(tuple); #else TupleTableSlot *slot = (TupleTableSlot *) arg[i]; HeapTuple tuple = slot->val; TupleDesc tupdesc = slot->ttc_tupleDescriptor; PROTECT(el = pg_tuple_get_r_frame(1, &tuple, tupdesc)); #endif /* PG_VERSION_75_COMPAT */ Given the above changes, it's almost working now -- only problem left is with triggers: insert into foo values(11,'cat99',1.89); + ERROR: record type has not been registered + CONTEXT: In PL/R function rejectfoo delete from foo; + ERROR: cache lookup failed for type 0 + CONTEXT: In PL/R function rejectfoo (and a few other similar failures) Any ideas why the trigger tuple type isn't registered, or what I'm doing wrong? Thanks, Joe
Joe Conway wrote: > Given the above changes, it's almost working now -- only problem left is > with triggers: > > > insert into foo values(11,'cat99',1.89); > + ERROR: record type has not been registered > + CONTEXT: In PL/R function rejectfoo > > delete from foo; > + ERROR: cache lookup failed for type 0 > + CONTEXT: In PL/R function rejectfoo > > (and a few other similar failures) > > Any ideas why the trigger tuple type isn't registered, or what I'm doing > wrong? A little more info on this. It appears that the tuple type is set to either 2249 (RECORDOID) or 0. In the case of RECORDOID this traces all the way back to here: /* ---------------------------------------------------------------- * CreateTemplateTupleDesc * * This function allocatesand zeros a tuple descriptor structure. * * Tuple type ID information is initially set for an anonymous record* type; caller can overwrite this if needed. * ---------------------------------------------------------------- */ But the type id is never overwritten for a BEFORE INSERT trigger. It appears that somewhere it is explictly set to InvalidOid for both BEFORE DELETE and AFTER INSERT triggers (and possibly others). My take is that we now need to explicitly set the tuple type id for INSERT/UPDATE/DELETE statements -- not sure where the best place to do that is though. Does this sound correct? Joe
Joe Conway <mail@joeconway.com> writes: > For triggers, I was previously building up the arguments thus: > slot = TupleDescGetSlot(tupdesc); > slot->val = trigdata->tg_trigtuple; > arg[7] = PointerGetDatum(slot); > I suppose now I should do this instead? > arg[7] = PointerGetDatum(trigdata->tg_trigtuple->t_data); Hm, no, that won't work because a tuple being passed to a trigger probably isn't going to contain valid type information. The API for calling triggers is different from calling ordinary functions, so I never thought about trying to make it look the same. At what point are you trying to do the above, anyway? regards, tom lane
Joe Conway <mail@joeconway.com> writes: >> Any ideas why the trigger tuple type isn't registered, or what I'm doing >> wrong? > A little more info on this. It appears that the tuple type is set to > either 2249 (RECORDOID) or 0. After further thought, we could possibly make it work for BEFORE triggers, but there's just no way for AFTER triggers: in that case what you are getting is an image of what went to disk, which is going to contain transaction info not type info. If you really want the trigger API for PL/R to be indistinguishable from the function-call API, then I think you will need to copy the passed tuple and insert type information. This is more or less what ExecEvalVar does now in the whole-tuple case (the critical code is actually in heap_getsysattr though): HeapTupleHeader dtup; dtup = (HeapTupleHeader) palloc(tup->t_len); memcpy((char *) dtup, (char *) tup->t_data, tup->t_len); HeapTupleHeaderSetDatumLength(dtup, tup->t_len); HeapTupleHeaderSetTypeId(dtup, tupleDesc->tdtypeid); HeapTupleHeaderSetTypMod(dtup, tupleDesc->tdtypmod); result = PointerGetDatum(dtup); regards, tom lane
Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: > >>For triggers, I was previously building up the arguments thus: >> slot = TupleDescGetSlot(tupdesc); >> slot->val = trigdata->tg_trigtuple; >> arg[7] = PointerGetDatum(slot); > > >>I suppose now I should do this instead? >> arg[7] = PointerGetDatum(trigdata->tg_trigtuple->t_data); > > > Hm, no, that won't work because a tuple being passed to a trigger > probably isn't going to contain valid type information. The API for > calling triggers is different from calling ordinary functions, so > I never thought about trying to make it look the same. At what point > are you trying to do the above, anyway? That's a shame -- it used to work fine -- done this way so the same function could handle tuple arguments to regular functions, and old/new tuples to trigger functions. It is in plr_trigger_handler(); vaguely similar to pltcl_trigger_handler(). I'll have to figure out a workaround I guess. Joe
Tom Lane wrote: > If you really want the trigger API for PL/R to be indistinguishable from > the function-call API, then I think you will need to copy the passed > tuple and insert type information. This is more or less what > ExecEvalVar does now in the whole-tuple case (the critical code is > actually in heap_getsysattr though): That got me there. It may not be the best in terms of pure speed, but it is easier and simpler than refactoring, at least at the moment. And I don't think the reason people will choose PL/R for triggers is speed in any case ;-) Thanks! Joe