Thread: Re: [HACKERS] Data types

Re: [HACKERS] Data types

From
"Thomas G. Lockhart"
Date:
(We were discussing support for complicated data types last October)

OK, I think I understand most of what the parser does to handle data
types and to handle type resolution. I've started rewriting the code
involved with functions, operators, and targets, and have completed a
trial implementation. I was wondering where some things which must
happen were actually done, and it turns out that they aren't! For
example:

postgres=> create table c1 (c char(4));
postgres=> create table c2 (c char(4));
postgres=> insert into c1 values ('abc', 'abc');
postgres=> insert into c1 values ('def', 'def');
postgres=> insert into c2(c) select c || 'ghi' from c1;
INSERT 0 2
postgres=> select * from c2;
c      |t
-------+-
abc ghi|
def ghi|
(2 rows)

Note that our char(4) column happily ends up with strings 7 characters
long! This is with pristine v6.3.1 code.

OK, so to do things right, for, say, char() and for numeric(), we need a
way to provide target properties to routines even though the
characteristics of a particular value might be stored with the value.
The current system usually stores string or array properties with the
value; but even so we also need to provide any target properties to
correctly handle function and operator output into target columns. In
the example above, each input string of char(4) is concatenated with a
string constant, and stored back into a char(4) column. The target
column info must appear somewhere so a conversion or storage routine can
fix up the values.

> > > What about passing to functions references to some structures ?
> > > struct ... {
> > >         TypeSpec        type_spec;
> > >         ...some other things maybe...
> > >         Datum           data;           /* this is real data */
> > > }
> > > type_spec could be precision/scale for NUMBERs and DECIMALs,
> > > max len for (VAR)CHARs or other type specification value (pointer > > > to) for (other) user-defined types, so
onecould define new type  
> > > with pointing of function(s) to handle type_spec for this type in
> > > CREATE TABLE:
> > >         A_Column                An_User_Type[(type-spec)]
> > > Just like
> > > A NUMBER(5,2)but more general (object-oriented).

OK, so I'm thinking that we need to do something like this, but there
are problems in doing it with explicit structures in the backend:
1) the backend might need to know too much about each type, damaging the
extensibility
2) somehow the backend would need to be able to support dump/reload
operations (so there needs to be a string representation of the type).
3) we need this part to be extensible to new types also.

I'm not happy with the structure above, but I'm not sure why. It just
seems pretty complicated, and doesn't really address issues 1-3 above.

Well, finally a light came on: the characteristics of these complicated
types should be a type also! For example, the numeric type would have a
support definition type which contains two fields, the precision and
scale, and which as part of the definition would know how to print
itself out, read itself in, etc. Then, the backend would just need to be
able to match up the type with the support type, and the adt code would
need to be able to access both.

I haven't gotten farther than this yet, but it seems like any solution
has _got_ to take advantage of some of the existing type mechanisms so
that we have access to input/output routines for the type support info.
SQL doesn't naturally lend itself to full OO extensibility, but perhaps
we can extend the backend to handle syntax like

  typename(characteristic, characteristic,...)

where the characteristics have properties stored in the type/attribute
system and the backend knows how to meld the type with the support type
info (e.g. typeOutput = print("%s(%s)", typename, printTypeSupport) ).

Does this ring a bell with anyone? Vadim?

                      - Tom

> > Yes . For a few data types (bpchar for varchar support) the system already passes
> > more than just a single field structure to the type-handler code. It passes the
> > pointer to the structure, and also a couple of other parameters including the
> > maximum length. We could/should generalize this so that a descriptive structure
> > can be passed in for every type-handler (at least for -in() and -out() functions,
> > perhaps for all calls?) which is specialized for each data type and which is
>               ^^^^^^^^^
> Imho, for all. number_pl() should know about precision/scale of
> both args...
>
> > defined _by_ the data type code itself. This basically means that data types can
> > provide more "methods" to help with the type handling.
> >
> > The default behavior could be that NULL is passed for these extra arguments and
> > type-handlers could choose to ignore it (so perhaps existing handlers could work
> > without change).
>
> My suggestion is passing data in descriptive structure in all cases.
> This will require user to re-write all user-defined functions...
>
> >
> > For example, to implement NUMERIC(5,2) we would need to pass the precision and
> > scale numbers to the type handlers (numeric_in(), etc) after saving them when
> > defining the table/column. We also need a way to either save the original text
> > definition or to reconstruct it so that dump/reload can work.
>                    ^^^^^^^^^^^
> Type_spec handling functions could accept two args: one is
> type_spec itself (in external/internal formats) and second - in/out
> flag: in - to convert (5,2) into internal form, out - from
> internal form to external representation.
> I prefer this way.
>
> We could store type_spec in pg_attribute. This makes
> pg_attribute tuple len variable (if we'll allow type_spec
> has > 4 bytes len) and so many parts of code must be changed.
> For the moment, we could allow only 4 bytes for type_spec -
> it's ok for NUMBERs, DECIMALs, (VAR/BP)CHARs...
>
> Vadim

Re: [HACKERS] Data types

From
Bruce Momjian
Date:
> I haven't gotten farther than this yet, but it seems like any solution
> has _got_ to take advantage of some of the existing type mechanisms so
> that we have access to input/output routines for the type support info.
> SQL doesn't naturally lend itself to full OO extensibility, but perhaps
> we can extend the backend to handle syntax like
>
>   typename(characteristic, characteristic,...)
>
> where the characteristics have properties stored in the type/attribute
> system and the backend knows how to meld the type with the support type
> info (e.g. typeOutput = print("%s(%s)", typename, printTypeSupport) ).
>
> Does this ring a bell with anyone? Vadim?

How does atttypmod fit/not fit the need here?  It is passed to all
input/output functions.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] Data types

From
"Thomas G. Lockhart"
Date:
> > SQL doesn't naturally lend itself to full OO extensibility, but
> > we can extend the backend to handle syntax like
> >
> >   typename(characteristic, characteristic,...)
> >
> > where the characteristics have properties stored in the
> > type/attribute system and the backend knows how to meld the type
> > with the support type info (e.g.
> >   typeOutput = print("%s(%s)", typename, printTypeSupport) ).
> How does atttypmod fit/not fit the need here?  It is passed to all
> input/output functions.

At the moment, atttypmod is defined to be a two byte integer. I believe
it is used to flag specific behaviors regarding type handling, and I'm
not sure that, for example, it gives enough info and flexibility to keep
track of both precision and scale for the numeric() type as well as do
it's other jobs.

atttypmod is passed to input/output routines, but I think we need a
callable routine to convert internal representations also. That is, the
input/output routines convert from and to C strings, but we also need a
way to "convert" a type to itself (e.g. char(x) to char(4)), checking
atttypmod and/or other type-specific information while doing so. It also
needs a convention of some sort, or built-in to tables, so that this can
be set up extensibly and on-the-fly by the parser code.

Maybe if there were a routine defined in pg_proc which took a type and
atttypmod as arguments and output that same type the parser could look
for that and wrap it in a function call when converting types to
targets. Maybe that would be enough? It would be similar to the "cast"
convention we've adopted. I need to understand atttypmod's usage and
capabilities better to know for sure; these are just my impressions
right now.

atttypmod's presence in the code is certainly a good marker of where
we'd need to look to make changes (using atttypmod or something else).

??

                           - Tom

Re: [HACKERS] Data types

From
Bruce Momjian
Date:
>
> > > SQL doesn't naturally lend itself to full OO extensibility, but
> > > we can extend the backend to handle syntax like
> > >
> > >   typename(characteristic, characteristic,...)
> > >
> > > where the characteristics have properties stored in the
> > > type/attribute system and the backend knows how to meld the type
> > > with the support type info (e.g.
> > >   typeOutput = print("%s(%s)", typename, printTypeSupport) ).
> > How does atttypmod fit/not fit the need here?  It is passed to all
> > input/output functions.
>
> At the moment, atttypmod is defined to be a two byte integer. I believe
> it is used to flag specific behaviors regarding type handling, and I'm
> not sure that, for example, it gives enough info and flexibility to keep
> track of both precision and scale for the numeric() type as well as do
> it's other jobs.

I thought about this.  Unless you are going to make such a thing for
each type, you are going to have to over-load the storage for each type,
so you would have something like:

    typedef numeric_typmod {
        char len;
        char prec;
    };

and cast the atttypmod int2 value into that structure, and read the
field as two one-byte fields.

> atttypmod is passed to input/output routines, but I think we need a
> callable routine to convert internal representations also. That is, the
> input/output routines convert from and to C strings, but we also need a
> way to "convert" a type to itself (e.g. char(x) to char(4)), checking
> atttypmod and/or other type-specific information while doing so. It also
> needs a convention of some sort, or built-in to tables, so that this can
> be set up extensibly and on-the-fly by the parser code.

This should be possible.  Now that atttypmod is passed around the
backend through resdom, it should be available in almost all context.
The trick is to pass it to the type-specific conversion functions.
Should certainly be possible.  The initial implemenation of atttypmod
just passed it to input functions, and it was not passed through the
backend.  Now, it does, so we no longer do funny things with TupleDesc
in the executor for char() and varchar().  They now get their atttypmod
values set from Resdom, and those are passed to the output functions.

> Maybe if there were a routine defined in pg_proc which took a type and
> atttypmod as arguments and output that same type the parser could look
> for that and wrap it in a function call when converting types to
> targets. Maybe that would be enough? It would be similar to the "cast"
> convention we've adopted. I need to understand atttypmod's usage and
> capabilities better to know for sure; these are just my impressions
> right now.

Sounds like a plan.

> atttypmod's presence in the code is certainly a good marker of where
> we'd need to look to make changes (using atttypmod or something else).

Let me know if you find any limitations.  Initially, it was limited only
to input, but now, it will be more useful.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)