Thread: Re: [HACKERS] Data types
(We were discussing support for complicated data types last October) OK, I think I understand most of what the parser does to handle data types and to handle type resolution. I've started rewriting the code involved with functions, operators, and targets, and have completed a trial implementation. I was wondering where some things which must happen were actually done, and it turns out that they aren't! For example: postgres=> create table c1 (c char(4)); postgres=> create table c2 (c char(4)); postgres=> insert into c1 values ('abc', 'abc'); postgres=> insert into c1 values ('def', 'def'); postgres=> insert into c2(c) select c || 'ghi' from c1; INSERT 0 2 postgres=> select * from c2; c |t -------+- abc ghi| def ghi| (2 rows) Note that our char(4) column happily ends up with strings 7 characters long! This is with pristine v6.3.1 code. OK, so to do things right, for, say, char() and for numeric(), we need a way to provide target properties to routines even though the characteristics of a particular value might be stored with the value. The current system usually stores string or array properties with the value; but even so we also need to provide any target properties to correctly handle function and operator output into target columns. In the example above, each input string of char(4) is concatenated with a string constant, and stored back into a char(4) column. The target column info must appear somewhere so a conversion or storage routine can fix up the values. > > > What about passing to functions references to some structures ? > > > struct ... { > > > TypeSpec type_spec; > > > ...some other things maybe... > > > Datum data; /* this is real data */ > > > } > > > type_spec could be precision/scale for NUMBERs and DECIMALs, > > > max len for (VAR)CHARs or other type specification value (pointer > > > to) for (other) user-defined types, so onecould define new type > > > with pointing of function(s) to handle type_spec for this type in > > > CREATE TABLE: > > > A_Column An_User_Type[(type-spec)] > > > Just like > > > A NUMBER(5,2)but more general (object-oriented). OK, so I'm thinking that we need to do something like this, but there are problems in doing it with explicit structures in the backend: 1) the backend might need to know too much about each type, damaging the extensibility 2) somehow the backend would need to be able to support dump/reload operations (so there needs to be a string representation of the type). 3) we need this part to be extensible to new types also. I'm not happy with the structure above, but I'm not sure why. It just seems pretty complicated, and doesn't really address issues 1-3 above. Well, finally a light came on: the characteristics of these complicated types should be a type also! For example, the numeric type would have a support definition type which contains two fields, the precision and scale, and which as part of the definition would know how to print itself out, read itself in, etc. Then, the backend would just need to be able to match up the type with the support type, and the adt code would need to be able to access both. I haven't gotten farther than this yet, but it seems like any solution has _got_ to take advantage of some of the existing type mechanisms so that we have access to input/output routines for the type support info. SQL doesn't naturally lend itself to full OO extensibility, but perhaps we can extend the backend to handle syntax like typename(characteristic, characteristic,...) where the characteristics have properties stored in the type/attribute system and the backend knows how to meld the type with the support type info (e.g. typeOutput = print("%s(%s)", typename, printTypeSupport) ). Does this ring a bell with anyone? Vadim? - Tom > > Yes . For a few data types (bpchar for varchar support) the system already passes > > more than just a single field structure to the type-handler code. It passes the > > pointer to the structure, and also a couple of other parameters including the > > maximum length. We could/should generalize this so that a descriptive structure > > can be passed in for every type-handler (at least for -in() and -out() functions, > > perhaps for all calls?) which is specialized for each data type and which is > ^^^^^^^^^ > Imho, for all. number_pl() should know about precision/scale of > both args... > > > defined _by_ the data type code itself. This basically means that data types can > > provide more "methods" to help with the type handling. > > > > The default behavior could be that NULL is passed for these extra arguments and > > type-handlers could choose to ignore it (so perhaps existing handlers could work > > without change). > > My suggestion is passing data in descriptive structure in all cases. > This will require user to re-write all user-defined functions... > > > > > For example, to implement NUMERIC(5,2) we would need to pass the precision and > > scale numbers to the type handlers (numeric_in(), etc) after saving them when > > defining the table/column. We also need a way to either save the original text > > definition or to reconstruct it so that dump/reload can work. > ^^^^^^^^^^^ > Type_spec handling functions could accept two args: one is > type_spec itself (in external/internal formats) and second - in/out > flag: in - to convert (5,2) into internal form, out - from > internal form to external representation. > I prefer this way. > > We could store type_spec in pg_attribute. This makes > pg_attribute tuple len variable (if we'll allow type_spec > has > 4 bytes len) and so many parts of code must be changed. > For the moment, we could allow only 4 bytes for type_spec - > it's ok for NUMBERs, DECIMALs, (VAR/BP)CHARs... > > Vadim
> I haven't gotten farther than this yet, but it seems like any solution > has _got_ to take advantage of some of the existing type mechanisms so > that we have access to input/output routines for the type support info. > SQL doesn't naturally lend itself to full OO extensibility, but perhaps > we can extend the backend to handle syntax like > > typename(characteristic, characteristic,...) > > where the characteristics have properties stored in the type/attribute > system and the backend knows how to meld the type with the support type > info (e.g. typeOutput = print("%s(%s)", typename, printTypeSupport) ). > > Does this ring a bell with anyone? Vadim? How does atttypmod fit/not fit the need here? It is passed to all input/output functions. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
> > SQL doesn't naturally lend itself to full OO extensibility, but > > we can extend the backend to handle syntax like > > > > typename(characteristic, characteristic,...) > > > > where the characteristics have properties stored in the > > type/attribute system and the backend knows how to meld the type > > with the support type info (e.g. > > typeOutput = print("%s(%s)", typename, printTypeSupport) ). > How does atttypmod fit/not fit the need here? It is passed to all > input/output functions. At the moment, atttypmod is defined to be a two byte integer. I believe it is used to flag specific behaviors regarding type handling, and I'm not sure that, for example, it gives enough info and flexibility to keep track of both precision and scale for the numeric() type as well as do it's other jobs. atttypmod is passed to input/output routines, but I think we need a callable routine to convert internal representations also. That is, the input/output routines convert from and to C strings, but we also need a way to "convert" a type to itself (e.g. char(x) to char(4)), checking atttypmod and/or other type-specific information while doing so. It also needs a convention of some sort, or built-in to tables, so that this can be set up extensibly and on-the-fly by the parser code. Maybe if there were a routine defined in pg_proc which took a type and atttypmod as arguments and output that same type the parser could look for that and wrap it in a function call when converting types to targets. Maybe that would be enough? It would be similar to the "cast" convention we've adopted. I need to understand atttypmod's usage and capabilities better to know for sure; these are just my impressions right now. atttypmod's presence in the code is certainly a good marker of where we'd need to look to make changes (using atttypmod or something else). ?? - Tom
> > > > SQL doesn't naturally lend itself to full OO extensibility, but > > > we can extend the backend to handle syntax like > > > > > > typename(characteristic, characteristic,...) > > > > > > where the characteristics have properties stored in the > > > type/attribute system and the backend knows how to meld the type > > > with the support type info (e.g. > > > typeOutput = print("%s(%s)", typename, printTypeSupport) ). > > How does atttypmod fit/not fit the need here? It is passed to all > > input/output functions. > > At the moment, atttypmod is defined to be a two byte integer. I believe > it is used to flag specific behaviors regarding type handling, and I'm > not sure that, for example, it gives enough info and flexibility to keep > track of both precision and scale for the numeric() type as well as do > it's other jobs. I thought about this. Unless you are going to make such a thing for each type, you are going to have to over-load the storage for each type, so you would have something like: typedef numeric_typmod { char len; char prec; }; and cast the atttypmod int2 value into that structure, and read the field as two one-byte fields. > atttypmod is passed to input/output routines, but I think we need a > callable routine to convert internal representations also. That is, the > input/output routines convert from and to C strings, but we also need a > way to "convert" a type to itself (e.g. char(x) to char(4)), checking > atttypmod and/or other type-specific information while doing so. It also > needs a convention of some sort, or built-in to tables, so that this can > be set up extensibly and on-the-fly by the parser code. This should be possible. Now that atttypmod is passed around the backend through resdom, it should be available in almost all context. The trick is to pass it to the type-specific conversion functions. Should certainly be possible. The initial implemenation of atttypmod just passed it to input functions, and it was not passed through the backend. Now, it does, so we no longer do funny things with TupleDesc in the executor for char() and varchar(). They now get their atttypmod values set from Resdom, and those are passed to the output functions. > Maybe if there were a routine defined in pg_proc which took a type and > atttypmod as arguments and output that same type the parser could look > for that and wrap it in a function call when converting types to > targets. Maybe that would be enough? It would be similar to the "cast" > convention we've adopted. I need to understand atttypmod's usage and > capabilities better to know for sure; these are just my impressions > right now. Sounds like a plan. > atttypmod's presence in the code is certainly a good marker of where > we'd need to look to make changes (using atttypmod or something else). Let me know if you find any limitations. Initially, it was limited only to input, but now, it will be more useful. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)