Thread: type info refactoring

type info refactoring

From

Peter Eisentraut

Date:

31 October 2010, 03:03:23

Here's a big patch to avoid passing around type OID + typmod (+
collation) separately all over the place.  Instead, there is a new
struct TypeInfo that contains these fields, and only a pointer is passed
around.

Some stuff in here (or not in here) is probably a matter of taste, as
with any such large refactoring, but in general you can see that it
saves a lot of notational overhead.

Attachment

typeinfo-refactoring.patch.gz

Re: type info refactoring

From

Tom Lane

Date:

31 October 2010, 11:39:34

Peter Eisentraut <peter_e@gmx.net> writes:
> Here's a big patch to avoid passing around type OID + typmod (+
> collation) separately all over the place.  Instead, there is a new
> struct TypeInfo that contains these fields, and only a pointer is passed
> around.

> Some stuff in here (or not in here) is probably a matter of taste, as
> with any such large refactoring, but in general you can see that it
> saves a lot of notational overhead.

I can't escape the feeling that the principal result of this patch will
be a huge increase in palloc overhead.  It's certainly going to double
the number of nodes needed for common structures like Vars, Consts,
OpExprs, and FuncExprs.  Have you checked the effects on
parsing/planning runtime?

To my mind, the reason we have a distinction between type OID and typmod
is that for most operations, you know the type OID of the result but
not the typmod.  Trying to force typmod into every API that currently
works with type OIDs isn't going to alter that, so the net result will
just be a lot of inefficiency and extra notation to carry around
"I don't know" markers.

I'm not entirely sure where collation fits in this worldview, but I
think it is more nearly like typmod than type OID.

I wonder whether it wouldn't work better to keep type OID as it is,
and invent a separate struct that can carry type auxiliary information
(ie, typmod and collation, at present).  For the common case where you
don't have any auxiliary information, you just pass a NULL.

Also, maybe I missed this, but do we have a clear understanding of
how collation markers propagate through operations?  I seem to remember
that way back when, we were thinking of it as a runtime-determined
property --- for which, managing it like typmod would be quite useless.
This whole approach seems to depend on the assumption that collation
markers can be set at parse time, which isn't something I'm sure works.
        regards, tom lane

Re: type info refactoring

From

Heikki Linnakangas

Date:

31 October 2010, 13:50:40

On 31.10.2010 16:39, Tom Lane wrote:
> Peter Eisentraut<peter_e@gmx.net>  writes:
>> Here's a big patch to avoid passing around type OID + typmod (+
>> collation) separately all over the place.  Instead, there is a new
>> struct TypeInfo that contains these fields, and only a pointer is passed
>> around.
>
>> Some stuff in here (or not in here) is probably a matter of taste, as
>> with any such large refactoring, but in general you can see that it
>> saves a lot of notational overhead.
>
> I can't escape the feeling that the principal result of this patch will
> be a huge increase in palloc overhead.  It's certainly going to double
> the number of nodes needed for common structures like Vars, Consts,
> OpExprs, and FuncExprs.  Have you checked the effects on
> parsing/planning runtime?

Yeah, that was my first impression too. I assumed that TypeInfo would be 
embedded in other structs directly, rather than a pointer and palloc. 
Something like:

/* * TypeInfo - encapsulates type information */
typedef struct TypeInfo
{       Oid                     typeid;       int32           typmod;
} TypeInfo;
... typedef struct Const {        Expr            xpr;
-       Oid                     consttype;              /* pg_type OID 
of the constant's datatype */
-       int32           consttypmod;    /* typmod value, if any */
+       TypeInfo   consttype;          /* type information of the 
constant's datatype */        int                     constlen;               /* typlen of 
the constant's datatype */        Datum           constvalue;             /* the constant's value */

> Also, maybe I missed this, but do we have a clear understanding of
> how collation markers propagate through operations?  I seem to remember
> that way back when, we were thinking of it as a runtime-determined
> property --- for which, managing it like typmod would be quite useless.
> This whole approach seems to depend on the assumption that collation
> markers can be set at parse time, which isn't something I'm sure works.

Surely you have to be able to determine the collations at parse time, 
just like any other type information. I don't see how it could possible 
work at runtime. The collations involved affect the plan, whether you 
have to sort at various stages, for example.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: type info refactoring

From

Tom Lane

Date:

31 October 2010, 14:02:25

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> ... I assumed that TypeInfo would be 
> embedded in other structs directly, rather than a pointer and palloc. 

Yeah, that would avoid the extra-pallocs complaint, although it might be
notationally a bit of a PITA in places like equalfuncs.c.  I think that
would end up needing a separate COMPARE_TYPEINFO_FIELD macro instead of
being able to treat it like a Node* field.

But I'm still wondering whether it's smart to try to promote all of this
fundamentally-auxiliary information to first-class status.  It's really
unclear to me that that will end up being a net win either conceptually
or notationally.
        regards, tom lane

Re: type info refactoring

From

Robert Haas

Date:

31 October 2010, 15:30:43

On Sun, Oct 31, 2010 at 1:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> ... I assumed that TypeInfo would be
>> embedded in other structs directly, rather than a pointer and palloc.
>
> Yeah, that would avoid the extra-pallocs complaint, although it might be
> notationally a bit of a PITA in places like equalfuncs.c.  I think that
> would end up needing a separate COMPARE_TYPEINFO_FIELD macro instead of
> being able to treat it like a Node* field.
>
> But I'm still wondering whether it's smart to try to promote all of this
> fundamentally-auxiliary information to first-class status.  It's really
> unclear to me that that will end up being a net win either conceptually
> or notationally.

I think this is a chicken-and-egg problem.  Most of the things we use
typmod for are unimportant, because typmod doesn't get propagated
everywhere and therefore if you try to use it for anything that
actually matters, it'll break.  And on the flip side, there's no need
for typmod to get propagated everywhere, because it's not used for
anything all that important.  Blah!

It's true that if the ostensible maximum length of a string or the
precision of a numeric get lost somewhere on their path through the
system, probably nothing terribly awful will happen.  The worst case
is that those values won't be enforced someplace where the user might
expect it, and that's probably avoidable in most practical cases by
adding an appropriate cast.  I'm not sure whether it'll also be true
for collation, because that affects comparison semantics, and getting
the wrong comparison semantics is worse than failing to enforce a
maximum length.

And we keep having these pesky requests to embed more complex
information in the typmod, some of which are things that can't just be
lightly thrown away because we feel like it.  One of the more common
ones is "an OID", so you can have things like a range over a
designated base type, or a map from one base type to another base
type, or whatever.  Right now the on-disk representation of an array
includes a 4-byte OID to store the type of the elements in that array.That's almost pure evil.  Data in the database
shouldnot need to be 
self-identifying: imagine what our performance would look like if
every integer datum in the database had to contain a tag identifying
it as an integer.  Granting that we have no immediate ability to
change it, we should be thinking about what sort of infrastructure
would be needed to eliminate this type of kludgery, or at least make
it unnecessary for new types.

Long story short, I'm inclined to view any data structure that is
carrying only the type OID with great suspicion.  If the additional
information isn't needed today, it may well be tomorrow.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: type info refactoring

From

Pavel Stehule

Date:

31 October 2010, 15:43:55

2010/10/31 Robert Haas <robertmhaas@gmail.com>:
> On Sun, Oct 31, 2010 at 1:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>>> ... I assumed that TypeInfo would be
>>> embedded in other structs directly, rather than a pointer and palloc.
>>
>> Yeah, that would avoid the extra-pallocs complaint, although it might be
>> notationally a bit of a PITA in places like equalfuncs.c.  I think that
>> would end up needing a separate COMPARE_TYPEINFO_FIELD macro instead of
>> being able to treat it like a Node* field.
>>
>> But I'm still wondering whether it's smart to try to promote all of this
>> fundamentally-auxiliary information to first-class status.  It's really
>> unclear to me that that will end up being a net win either conceptually
>> or notationally.
>
> I think this is a chicken-and-egg problem.  Most of the things we use
> typmod for are unimportant, because typmod doesn't get propagated
> everywhere and therefore if you try to use it for anything that
> actually matters, it'll break.  And on the flip side, there's no need
> for typmod to get propagated everywhere, because it's not used for
> anything all that important.  Blah!
>

yes, there is a few good possible features that's needs a better using of typmod

a) typmod for OUT varibles
b) enhanced polymorphic types - ANYELEMENT(x)

Regards

Pavel Stehule

Re: type info refactoring

From

Peter Eisentraut

Date:

31 October 2010, 18:59:04

On sön, 2010-10-31 at 10:39 -0400, Tom Lane wrote:
> To my mind, the reason we have a distinction between type OID and
> typmod
> is that for most operations, you know the type OID of the result but
> not the typmod.  Trying to force typmod into every API that currently
> works with type OIDs isn't going to alter that, so the net result will
> just be a lot of inefficiency and extra notation to carry around
> "I don't know" markers.

This patch doesn't introduce typmods into places that didn't deal with
them before.  It only replaces function calls and structures that had
separate arguments/fields for type OID and typmod with a single
argument/field.

Re: type info refactoring

From

Peter Eisentraut

Date:

31 October 2010, 19:01:22

On sön, 2010-10-31 at 18:50 +0200, Heikki Linnakangas wrote:
> Yeah, that was my first impression too. I assumed that TypeInfo would be 
> embedded in other structs directly, rather than a pointer and palloc. 
> Something like:
> 
> /*
>   * TypeInfo - encapsulates type information
>   */
> typedef struct TypeInfo
> {
>         Oid                     typeid;
>         int32           typmod;
> } TypeInfo;
> ...
>   typedef struct Const
>   {
>          Expr            xpr;
> -       Oid                     consttype;              /* pg_type OID 
> of the constant's datatype */
> -       int32           consttypmod;    /* typmod value, if any */
> +       TypeInfo   consttype;          /* type information of the 
> constant's datatype */
>          int                     constlen;               /* typlen of 
> the constant's datatype */
>          Datum           constvalue;             /* the constant's value */

That's another possibility, but you can't stick TypeInfo into a list
that way.

Re: type info refactoring

From

Peter Eisentraut

Date:

31 October 2010, 19:04:34

On sön, 2010-10-31 at 13:01 -0400, Tom Lane wrote:
> But I'm still wondering whether it's smart to try to promote all of
> this fundamentally-auxiliary information to first-class status.  It's
> really unclear to me that that will end up being a net win either
> conceptually or notationally.

Fair enough, but this patch arose from the discussion that the collation
patch had a lot of hunks that just changed (typeid, typmod) to (typeid,
typmod, collation) and that that could be isolated by collecting those
into a common data structure.  We can abandon this line of thought and
I'll go back to my original project, but I thought others who are
thinking about improving typmods could also benefit from this work.

Re: type info refactoring

From

Peter Eisentraut

Date:

31 October 2010, 19:13:59

On sön, 2010-10-31 at 14:30 -0400, Robert Haas wrote:
> It's true that if the ostensible maximum length of a string or the
> precision of a numeric get lost somewhere on their path through the
> system, probably nothing terribly awful will happen.  The worst case
> is that those values won't be enforced someplace where the user might
> expect it, and that's probably avoidable in most practical cases by
> adding an appropriate cast.  I'm not sure whether it'll also be true
> for collation, because that affects comparison semantics, and getting
> the wrong comparison semantics is worse than failing to enforce a
> maximum length.

I think the problem is rather that we don't have a good answer for what
to do about propagating and combining typmods in all the cases.  What
should varchar(10) || varchar(15) be?  Probably varchar(25).  What about
numeric(10) + numeric(15)?  What about numeric(10) * numeric(15)? etc.
If we had a generalized answer to that, it might be possible to
implement it in the right places.  (I'd guess it would be about half of
the size of the current collation patch.)

> Long story short, I'm inclined to view any data structure that is
> carrying only the type OID with great suspicion.  If the additional
> information isn't needed today, it may well be tomorrow.

Maybe, but again this patch doesn't solve that.  It just combines
existing OID + typmod into a single structure.  It doesn't add typmods
where there were none before.

Re: type info refactoring

From

Robert Haas

Date:

31 October 2010, 22:13:25

On Sun, Oct 31, 2010 at 6:13 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> On sön, 2010-10-31 at 14:30 -0400, Robert Haas wrote:
>> It's true that if the ostensible maximum length of a string or the
>> precision of a numeric get lost somewhere on their path through the
>> system, probably nothing terribly awful will happen.  The worst case
>> is that those values won't be enforced someplace where the user might
>> expect it, and that's probably avoidable in most practical cases by
>> adding an appropriate cast.  I'm not sure whether it'll also be true
>> for collation, because that affects comparison semantics, and getting
>> the wrong comparison semantics is worse than failing to enforce a
>> maximum length.
>
> I think the problem is rather that we don't have a good answer for what
> to do about propagating and combining typmods in all the cases.  What
> should varchar(10) || varchar(15) be?  Probably varchar(25).  What about
> numeric(10) + numeric(15)?  What about numeric(10) * numeric(15)? etc.
> If we had a generalized answer to that, it might be possible to
> implement it in the right places.  (I'd guess it would be about half of
> the size of the current collation patch.)

I think the answer is that in some of those cases it doesn't matter,
and that just saying it's plain old numeric is fine.  But that's not
necessarily true for all possible uses of typmodish stuff.

>> Long story short, I'm inclined to view any data structure that is
>> carrying only the type OID with great suspicion.  If the additional
>> information isn't needed today, it may well be tomorrow.
>
> Maybe, but again this patch doesn't solve that.  It just combines
> existing OID + typmod into a single structure.  It doesn't add typmods
> where there were none before.

OK.  That seems like a good place to start.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company