Thread: Why don't we support external input/output functions for the composite types
Hi, I'm curious about composite types in PostgreSQL. By default, when we create a composite type, it utilizes the "record_in" and "record_out" functions for input/output. Do you think it would be beneficial to expand the syntax to allow users to specify custom input/output functions when creating composite types? Has anyone attempted this before, and are there any design challenges associated with it? Or is it simply not implemented because it's not seen as a valuable addition? I believe it would be beneficial because users creating a new type might prefer to define specific input/output syntax rather than conforming to what is accepted by the RECORD type. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Why don't we support external input/output functions for the composite types
From
Tom Lane
Date:
Dilip Kumar <dilipbalaut@gmail.com> writes: > I'm curious about composite types in PostgreSQL. By default, when we > create a composite type, it utilizes the "record_in" and "record_out" > functions for input/output. Do you think it would be beneficial to > expand the syntax to allow users to specify custom input/output > functions when creating composite types? No. > I believe it would be beneficial because users creating a new type > might prefer to define specific input/output syntax rather than > conforming to what is accepted by the RECORD type. The primary outcome would be to require a huge amount of new work to be done by a lot of software, much of it not under our control. And the impact wouldn't only be to software that would prefer not to know about this. For example, how likely do you think it is that these hypothetical user-defined I/O functions would cope well with ALTER TABLE/ALTER TYPE commands that change those rowtypes? regards, tom lane
Re: Why don't we support external input/output functions for the composite types
From
Dilip Kumar
Date:
On Thu, Apr 25, 2024 at 10:14 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Dilip Kumar <dilipbalaut@gmail.com> writes: > > I'm curious about composite types in PostgreSQL. By default, when we > > create a composite type, it utilizes the "record_in" and "record_out" > > functions for input/output. Do you think it would be beneficial to > > expand the syntax to allow users to specify custom input/output > > functions when creating composite types? > > No. > > > I believe it would be beneficial because users creating a new type > > might prefer to define specific input/output syntax rather than > > conforming to what is accepted by the RECORD type. > Thanks for the quick response, Tom. > The primary outcome would be to require a huge amount of new work > to be done by a lot of software, much of it not under our control. Yeah, I agree with that. > And the impact wouldn't only be to software that would prefer not > to know about this. For example, how likely do you think it is > that these hypothetical user-defined I/O functions would cope > well with ALTER TABLE/ALTER TYPE commands that change those > rowtypes? That's a good point. I was primarily focused on altering the representation of input and output values, rather than considering changes to internal storage. However, offering this feature could indeed allow users to influence how values are stored. And that can potentially affect ALTER TYPE because then we do not have control over how those values are stored internally. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Why don't we support external input/output functions for the composite types
From
Robert Haas
Date:
This thread caught my eye this morning, and I'm confused. On Thu, Apr 25, 2024 at 12:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > The primary outcome would be to require a huge amount of new work > to be done by a lot of software, much of it not under our control. What, what is the "lot of software" that would have to be changed? It can't be existing extensions, because they wouldn't be forced into using this feature. Are you thinking that drivers or admin tools would need to support it? To me it seems like the only required changes would be to things that know how to parse the output of record_out(), and there is probably some of that, but the language you're using here is so emphatic as to make me suspect that you anticipate some larger impact. > And the impact wouldn't only be to software that would prefer not > to know about this. For example, how likely do you think it is > that these hypothetical user-defined I/O functions would cope > well with ALTER TABLE/ALTER TYPE commands that change those > rowtypes? Hmm. Dilip mentioned changing the storage format, but if you do that, you have bigger problems, like my_record_type.x no longer working. At that point I don't see why what you have is properly called a record type at all. So I guess what you're imagining here is that ALTER TABLE .. ALTER TYPE would try COERCION_PATH_COERCEVIAIO, but, uh so what? We could probably fix it so that such coercions were handled in some other way, but even if we didn't, it just means the user has to provide a USING clause, which is no different than what happens in any other case where coerce-via-I/O doesn't work out. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Why don't we support external input/output functions for the composite types
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > This thread caught my eye this morning, and I'm confused. > On Thu, Apr 25, 2024 at 12:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The primary outcome would be to require a huge amount of new work >> to be done by a lot of software, much of it not under our control. > What, what is the "lot of software" that would have to be changed? I think this potentially affects stuff as low-level as drivers, particularly those that deal in binary not text I/O. It's not unreasonable for client code to assume that any type with typtype 'c' (composite) will adhere to the specifications at https://www.postgresql.org/docs/current/rowtypes.html#ROWTYPES-IO-SYNTAX especially because that section pretty much says in so many words that that's the case. > It > can't be existing extensions, because they wouldn't be forced into > using this feature. Are you thinking that drivers or admin tools would > need to support it? Yes. We've heard that argument about "this only affects extensions that choose to use it" before, and it's nonsense. As soon as you extend system-wide APIs, the consequences are system-wide: everybody has to cope with the new definition. >> For example, how likely do you think it is >> that these hypothetical user-defined I/O functions would cope >> well with ALTER TABLE/ALTER TYPE commands that change those >> rowtypes? > Hmm. Dilip mentioned changing the storage format, but if you do that, > you have bigger problems, like my_record_type.x no longer working. At > that point I don't see why what you have is properly called a record > type at all. Yup, I agree. > So I guess what you're imagining here is that ALTER TABLE > .. ALTER TYPE would try COERCION_PATH_COERCEVIAIO, but, uh so what? Uh, no. My point is that if you make a custom output function for "type complex (real float8, imaginary float8)", that function will probably crash pretty hard if what it's handed is something other than two float8s. But there is nothing to stop somebody from trying to ALTER the type to be two numerics or whatever. Conversely, the type's custom input function would likely keep on producing two float8s, yielding corrupt data so far as the rest of the system is concerned. You could imagine preventing such trouble by forbidding ALTER TYPE on types with custom I/O functions. But that just makes it even more obvious that what this is is a poorly-thought-through hack, rather than a feature that interoperates well with the rest of Postgres. I think that to the extent that there's a need for custom I/O of something like this, it should be handled by bespoke types, similar to (say) type point. regards, tom lane
Re: Why don't we support external input/output functions for the composite types
From
Robert Haas
Date:
On Thu, Apr 25, 2024 at 12:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Yes. We've heard that argument about "this only affects extensions > that choose to use it" before, and it's nonsense. As soon as you > extend system-wide APIs, the consequences are system-wide: everybody > has to cope with the new definition. Sure. Any new feature has this problem to some extent. > Uh, no. My point is that if you make a custom output function > for "type complex (real float8, imaginary float8)", that function > will probably crash pretty hard if what it's handed is something > other than two float8s. But there is nothing to stop somebody > from trying to ALTER the type to be two numerics or whatever. > Conversely, the type's custom input function would likely keep on > producing two float8s, yielding corrupt data so far as the rest > of the system is concerned. I'm not sure I really buy this. Changing the column definitions amounts to changing the on-disk format, and no data type can survive a change to the on-disk format without updating the I/O functions to match. > I think that to the extent that there's a need for custom I/O > of something like this, it should be handled by bespoke types, > similar to (say) type point. I somewhat agree with this. The main disadvantage of that approach is that you lose the ability to directly refer to the members, which in some cases would be quite nice. I bet a lot of people would enjoy being able to write my_point.x and my_point.y instead of my_point[0] and my_point[1], for example. But maybe the solution to that is not $SUBJECT. A related problem is that, even if my_point behaved like a composite type, you'd have to write (my_point).x and (my_point).y to avoid something like: ERROR: missing FROM-clause entry for table "my_point" I think it's confusing and counterintuitive that putting parentheses around a subexpression completely changes the meaning. I don't know of any other programming language that behaves that way, and I find the way the "indirection" productions are coded in gram.y to be highly questionable. I suspect everything we currently treat as an indirection_el should instead be a way of constructing a new a_expr or c_expr or something like that, but I strongly suspect if I try to make the work I'll discover horrible problems I can't fix. Still, it's awful. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Why don't we support external input/output functions for the composite types
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Apr 25, 2024 at 12:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Uh, no. My point is that if you make a custom output function >> for "type complex (real float8, imaginary float8)", that function >> will probably crash pretty hard if what it's handed is something >> other than two float8s. > I'm not sure I really buy this. Changing the column definitions > amounts to changing the on-disk format, and no data type can survive a > change to the on-disk format without updating the I/O functions to > match. What I'm trying to say is: given that the command "alter type T alter attribute A type foo" exists, users would reasonably expect the system to honor that on its own for any composite type, because that's what it does today. But it can't if T has custom I/O functions, at least not without understanding how to rewrite those functions. >> I think that to the extent that there's a need for custom I/O >> of something like this, it should be handled by bespoke types, >> similar to (say) type point. > I somewhat agree with this. The main disadvantage of that approach is > that you lose the ability to directly refer to the members, which in > some cases would be quite nice. I bet a lot of people would enjoy > being able to write my_point.x and my_point.y instead of my_point[0] > and my_point[1], for example. But maybe the solution to that is not > $SUBJECT. Nope, it isn't IMO. We already added infrastructure to allow arbitrary custom types to define subscripting operations. I think a case could be made to allow them to define field selection, too. > I think it's confusing and counterintuitive that putting parentheses > around a subexpression completely changes the meaning. I don't know of > any other programming language that behaves that way, I take it that you also don't believe that "2 + 3 * 4" should yield different results from "(2 + 3) * 4"? I could get behind offering an alternative notation, eg "a.b->c does the same thing as (a.b).c", if we could find a reasonable notation that doesn't infringe on user operator namespace. I think that might be hard to do though, and I don't think the existing notation is so awful than we should break existing operators to have an alternative. The business with deprecating => operators a few years ago had the excuse that "the standard says so", but we don't have that justification here. regards, tom lane
Re: Why don't we support external input/output functions for the composite types
From
Isaac Morland
Date:
On Thu, 25 Apr 2024 at 17:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I think it's confusing and counterintuitive that putting parentheses
> around a subexpression completely changes the meaning. I don't know of
> any other programming language that behaves that way,
I take it that you also don't believe that "2 + 3 * 4" should yield
different results from "(2 + 3) * 4"?
In that expression "2 + 3" is not a subexpression, although "3 * 4" is, thanks to the operator precedence rules.
I could get behind offering an alternative notation, eg "a.b->c does
the same thing as (a.b).c", if we could find a reasonable notation
that doesn't infringe on user operator namespace. I think that might
be hard to do though, and I don't think the existing notation is so
awful than we should break existing operators to have an alternative.
The business with deprecating => operators a few years ago had the
excuse that "the standard says so", but we don't have that
justification here.
This is one of those areas where it will be difficult at best to do something which makes things work the way people expect without breaking other cases. I certainly would like to be able to use . to extract a field from a composite value without parenthesizing everything, but given the potential for having a schema name that matches a table or field name I would want to be very careful about changing anything.
Re: Why don't we support external input/output functions for the composite types
From
Tom Lane
Date:
I wrote: > I could get behind offering an alternative notation, eg "a.b->c does > the same thing as (a.b).c", if we could find a reasonable notation > that doesn't infringe on user operator namespace. I think that might > be hard to do though, and I don't think the existing notation is so > awful than we should break existing operators to have an alternative. > The business with deprecating => operators a few years ago had the > excuse that "the standard says so", but we don't have that > justification here. A different approach we could take is to implement the SQL99 rules for <identifier chain>, or at least move closer to that. Our existing rules for resolving qualified column references are more or less SQL92. I think the reasons we didn't do that when we first implemented SQL99 are (1) The SQL99 rules are fundamentally ambiguous, which they wave away by saying that it's user error if there's more than one way to interpret the reference. This approach is decidedly not nice, notably because it means that unrelated-looking changes in your schema can break your query. Having to check multiple interpretations slows parsing, too. (2) Switching from SQL92 to SQL99 rules would break some queries anyway. (At least, that's my recollection, though looking at the specs right now I don't see any case where SQL99 doesn't take a SQL92 alternative, so long as you don't run into (1).) Still, maybe it's time to think about changing? We could use the "the standard says so" excuse with anybody who complains. In the long run I wish we could ditch the SQL92 rules altogether and say that the head identifier of a qualified column reference must be a table's correlation name, not a schema or catalog name. There's zero good reason for the latter two cases, other than compatibility with thirty-year-old design mistakes. I kind of doubt we could make that fly though. regards, tom lane
Re: Why don't we support external input/output functions for the composite types
From
Robert Haas
Date:
On Thu, Apr 25, 2024 at 5:05 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I'm not sure I really buy this. Changing the column definitions > > amounts to changing the on-disk format, and no data type can survive a > > change to the on-disk format without updating the I/O functions to > > match. > > What I'm trying to say is: given that the command "alter type T alter > attribute A type foo" exists, users would reasonably expect the system > to honor that on its own for any composite type, because that's what > it does today. But it can't if T has custom I/O functions, at least > not without understanding how to rewrite those functions. I understand your point, but I don't agree with it. Ordinary users wouldn't be able to create types like this anyway, because there's no way we can allow an unprivileged user to set an input or output function. It would have to be restricted to superusers, just as we do for base types. And IMHO those have basically the same issue: you have to ensure that all the functions and operators that operate on the type, and any subscripting operations, are on the same page about what the underlying storage is. This doesn't seem different. It may well still be a bad idea for other reasons, or just kind of useless, but I disagree that it's a bad idea for that particular reason. > Nope, it isn't IMO. We already added infrastructure to allow > arbitrary custom types to define subscripting operations. I think a > case could be made to allow them to define field selection, too. That would be cool! > I take it that you also don't believe that "2 + 3 * 4" should yield > different results from "(2 + 3) * 4"? Isaac's rebuttal to this particular point was perfect; I have nothing to add. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Why don't we support external input/output functions for the composite types
From
Robert Haas
Date:
On Thu, Apr 25, 2024 at 5:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > A different approach we could take is to implement the SQL99 rules > for <identifier chain>, or at least move closer to that. Our > existing rules for resolving qualified column references are more > or less SQL92. I think the reasons we didn't do that when we first > implemented SQL99 are I'm not familiar with these rules. Do they allow stuff like a.b.c.d.e, or better yet, a.b(args).c(args).d(args).e(args)? > Still, maybe it's time to think about changing? We could use > the "the standard says so" excuse with anybody who complains. I certainly agree that if we're going to break stuff, breaking stuff to get closer to the standard is superior to other ways of breaking stuff. Without knowing what we'd get out of it, I don't have an opinion about whether it's worth it here or not, but making our syntax more like other programming languages and especially other popular database products does seem to me to have positive value. > In the long run I wish we could ditch the SQL92 rules altogether > and say that the head identifier of a qualified column reference > must be a table's correlation name, not a schema or catalog name. > There's zero good reason for the latter two cases, other than > compatibility with thirty-year-old design mistakes. I kind of > doubt we could make that fly though. Yeah, I think that would break too much stuff. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Why don't we support external input/output functions for the composite types
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Apr 25, 2024 at 5:05 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> What I'm trying to say is: given that the command "alter type T alter >> attribute A type foo" exists, users would reasonably expect the system >> to honor that on its own for any composite type, because that's what >> it does today. But it can't if T has custom I/O functions, at least >> not without understanding how to rewrite those functions. > I understand your point, but I don't agree with it. Ordinary users > wouldn't be able to create types like this anyway, because there's no > way we can allow an unprivileged user to set an input or output > function. It would have to be restricted to superusers, just as we do > for base types. Well, that would be one way of making the consistency problem be not our problem, but it would be a sad restriction. It'd void a lot of the arguable use-case for this feature, if you ask me. I realize that non-superusers couldn't create the C-language I/O functions that would be most at risk here, but you could imagine people building I/O functions in some other PL. (We'd have to remove the restriction that cstring isn't an allowed input or return type for user-defined functions; but AFAIK that's just a holdover from days when cstring was a lot more magic than it is now.) Maybe there's an argument that PL functions already have to be proof enough against datatype inconsistencies that nothing really awful could happen. Not sure. In any case, if we have to put strange restrictions on a composite type when it has custom I/O functions, then that still is an indication that the feature is a hack that doesn't play nice with the rest of the system. So I remain of the opinion that we shouldn't go there. If field selection support for custom types will solve the use-case, I find that a lot more attractive. >> I take it that you also don't believe that "2 + 3 * 4" should yield >> different results from "(2 + 3) * 4"? > Isaac's rebuttal to this particular point was perfect; I have nothing to add. As far as I could tell, Isaac's rebuttal was completely off-point. regards, tom lane
Re: Why don't we support external input/output functions for the composite types
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Apr 25, 2024 at 5:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> A different approach we could take is to implement the SQL99 rules >> for <identifier chain>, or at least move closer to that. > I'm not familiar with these rules. Do they allow stuff like a.b.c.d.e, > or better yet, a.b(args).c(args).d(args).e(args)? The former. <identifier chain> ::= <identifier> [ { <period> <identifier> }... ] The hard part is to figure out what the first identifier is: column name? table correlation name (AS name)? schema name or catalog name of a qualified table name? function parameter name? After that, as long as what you have is of composite type, you can drill down into it. If I'm reading SQL99 correctly, they deny allowing the first identifier to be a column name when there's more than one identifier, so that you must table-qualify a composite column before you can select a field from it. But they allow all the other possibilities and claim it's user error if more than one could apply, which seems like an awful design to me. At minimum I'd want to say that the correlation name should be the first choice and wins if there's a match, regardless of anything else, because otherwise there is no safe way for ruleutils to deparse such a construct. And probably function parameter name should be second choice and similarly win over other choices, for the same reason. The other options are SQL92 compatibility holdovers and should only be considered if we can't find a matching correlation or parameter name. The net result of doing it like this, I think, is that we'd accept some cases where SQL99 would prefer to raise an ambiguity error. But we'd still be much closer to the modern standard than we are today. regards, tom lane
Re: Why don't we support external input/output functions for the composite types
From
Robert Haas
Date:
On Fri, Apr 26, 2024 at 11:55 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Well, that would be one way of making the consistency problem be not > our problem, but it would be a sad restriction. It'd void a lot of > the arguable use-case for this feature, if you ask me. I realize > that non-superusers couldn't create the C-language I/O functions that > would be most at risk here, but you could imagine people building > I/O functions in some other PL. Huh, I hadn't considered that. I figured the performance would be too bad to even think about it. I also wasn't even sure such a thing would be supportable: I thought cstrings were generally limited to C/internal functions. > >> I take it that you also don't believe that "2 + 3 * 4" should yield > >> different results from "(2 + 3) * 4"? > > > Isaac's rebuttal to this particular point was perfect; I have nothing to add. > > As far as I could tell, Isaac's rebuttal was completely off-point. OK, I'm not sure why, but let me explain my position. In an expression like (2 + 3) * 4, the parentheses change the order of evaluation, which makes sense. That's what parentheses are for, or at least one thing that parentheses are for. But in an expression like (my_point).x, that's not the case. There's only one operator here, the period, and so there's only one possible order of evaluation, so why do parentheses make any difference? Having (my_point).x be different from my_point.x is like having 2 + 3 give a different answer from (2 + 3), which would be bonkers. But it's not at all like the difference between 2 + 3 * 4 and (2 + 3) * 4. The comparable case there would be foo.bar.baz as against (foo.bar).baz or alternatively foo.(bar.baz). Now there are two dot operators, and one of them has to be applied first, and there's some default based on associativity, and if you want it the other way you stick parentheses in there to tell the parser what you meant. And the reason I thought Isaac put it well is that he said, "In that expression 2 + 3 is not a subexpression, although 3 * 4 is, thanks to the operator precedence rules." Exactly so. 2 + 3 * 4 is going to be parsed as something like OpExpr(+, 2, OpExpr(*, 3, 4)) -- and that does not have OpExpr(+, 2, 3) anywhere inside of it, so my comment that parenthesizing a subexpression shouldn't change its meaning is not relevant here. I'm perfectly fine with parentheses changing which things we parse as subexpressions. Users have no license to just stick parentheses into your SQL expressions in random places and expect that they don't do anything; if that were so, we'd have to make ((2 +) 3)()()() evaluate to 5, which is obviously nonsense. Rather, what I don't like about the status quo is that putting parentheses around something that we were already going to consider as a unit changes the meaning of it. And that's exactly what happens when you write x.y vs. (x).y. The parentheses around the x make us think that it's a different kind of thing, somehow. That seems odd, and the practical result is that you have to insert a bunch of parentheses into your PostgreSQL code that look like they shouldn't be needed, but are. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Why don't we support external input/output functions for the composite types
From
Robert Haas
Date:
On Fri, Apr 26, 2024 at 12:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I'm not familiar with these rules. Do they allow stuff like a.b.c.d.e, > > or better yet, a.b(args).c(args).d(args).e(args)? > > The former. > > <identifier chain> ::= > <identifier> [ { <period> <identifier> }... ] OK, nice. > The hard part is to figure out what the first identifier is: > column name? table correlation name (AS name)? schema name or > catalog name of a qualified table name? function parameter name? > After that, as long as what you have is of composite type, > you can drill down into it. Right, makes sense. > If I'm reading SQL99 correctly, they deny allowing the first > identifier to be a column name when there's more than one identifier, > so that you must table-qualify a composite column before you can > select a field from it. But they allow all the other possibilities > and claim it's user error if more than one could apply, which seems > like an awful design to me. At minimum I'd want to say that the > correlation name should be the first choice and wins if there's > a match, regardless of anything else, because otherwise there is > no safe way for ruleutils to deparse such a construct. And > probably function parameter name should be second choice and > similarly win over other choices, for the same reason. The other > options are SQL92 compatibility holdovers and should only be > considered if we can't find a matching correlation or parameter name. I definitely agree that there must always be some way to make it unambiguous, not just because of deparsing but also because users are going to want a way to force their preferred interpretation. I've been a PostgreSQL developer now for considerably longer than I was an end user, but I definitely would not have liked "ERROR: you can't get there from here". I'm less certain how that should be spelled. The rules you propose make sense to me up to a point, but what happens if the same unqualified name is both a table alias and a function parameter name? I think I need a way of forcing the function-parameter interpretation. You could make function_name.parameter_name resolve to that, but then what happens if function_name is also a table alias in the containing query? It's really hard to think of a set of rules here that don't leave any room for unfixable problems. Maybe the answer is that we should support some completely different notion for unambiguously referencing parameters, like ${parameter_name}. I don't know. I think that what you're proposing here could be a nice improvement but it definitely seems tricky to get it completely right. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Why don't we support external input/output functions for the composite types
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Apr 26, 2024 at 11:55 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Well, that would be one way of making the consistency problem be not >> our problem, but it would be a sad restriction. It'd void a lot of >> the arguable use-case for this feature, if you ask me. I realize >> that non-superusers couldn't create the C-language I/O functions that >> would be most at risk here, but you could imagine people building >> I/O functions in some other PL. > Huh, I hadn't considered that. I figured the performance would be too > bad to even think about it. I also wasn't even sure such a thing would > be supportable: I thought cstrings were generally limited to > C/internal functions. Performance could indeed be an issue, but I think people taking this path would be doing so because they value programmer time more than machine time. And while there once were good reasons to not let user functions deal in cstrings, I'm not sure there are anymore. (The point would deserve closer investigation if we actually tried to move forward on it, of course.) > Rather, what I > don't like about the status quo is that putting parentheses around > something that we were already going to consider as a unit changes the > meaning of it. And that's exactly what happens when you write x.y vs. > (x).y. But that's exactly the point: we DON'T consider the initial identifier of a qualified name "as a unit", and neither does the standard. We have to figure out how many of the identifiers name an object (column or table) referenced in the query, and that is not clear up-front. SQL99's rules don't make that any better. The parens in our current notation serve to separate the object name from any field selection(s) done on the object. There's still some ambiguity, because we allow you to write either "(table.column).field" or "(table).column.field", but we've dealt with that for ages. regards, tom lane
Re: Why don't we support external input/output functions for the composite types
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Apr 26, 2024 at 12:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> If I'm reading SQL99 correctly, they deny allowing the first >> identifier to be a column name when there's more than one identifier, >> so that you must table-qualify a composite column before you can >> select a field from it. But they allow all the other possibilities >> and claim it's user error if more than one could apply, which seems >> like an awful design to me. > I'm less certain how that should be spelled. The rules you propose > make sense to me up to a point, but what happens if the same > unqualified name is both a table alias and a function parameter name? > I think I need a way of forcing the function-parameter interpretation. > You could make function_name.parameter_name resolve to that, but then > what happens if function_name is also a table alias in the containing > query? It's really hard to think of a set of rules here that don't > leave any room for unfixable problems. Maybe the answer is that we > should support some completely different notion for unambiguously > referencing parameters, like ${parameter_name}. I don't know. I think > that what you're proposing here could be a nice improvement but it > definitely seems tricky to get it completely right. I think you're moving the goal posts too far. It's on the user to spell the initially-written query unambiguously: if you chose a function parameter name that matches a table correlation name in the query, that's your fault and you'd better rename one of those things. What concerns me is the hazard that the query is okay, and we store it, and then subsequent object creations or renamings create a situation where an identifier chain is ambiguous per the SQL99 rules. ruleutils has to be able to deparse the stored query in a way that is valid regardless of that. Giving first priority to correlation and parameter names makes this possible because external operations, even including renaming tables or columns used in the query, won't affect either. regards, tom lane PS: ruleutils does sometimes choose new correlation names, and it suddenly occurs to me that it's probably not being careful to avoid duplicating function parameter names. But that's independent of this discussion.
Re: Why don't we support external input/output functions for the composite types
From
Robert Haas
Date:
On Fri, Apr 26, 2024 at 12:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > But that's exactly the point: we DON'T consider the initial identifier > of a qualified name "as a unit", and neither does the standard. > We have to figure out how many of the identifiers name an object > (column or table) referenced in the query, and that is not clear > up-front. SQL99's rules don't make that any better. The parens in > our current notation serve to separate the object name from any field > selection(s) done on the object. There's still some ambiguity, > because we allow you to write either "(table.column).field" or > "(table).column.field", but we've dealt with that for ages. I agree that this is exactly the point. No other programming language that I know of, and no other database that I know of, looks at x.y.z and says "ok, well first we have to figure out whether the object is named x or x.y or x.y.z, and then after that, we'll use whatever is left over as a field selector." Instead, they have a top-level namespace where x refers to one and only one thing, and then they look for something called y inside of that, and if that's a valid object then they look inside of that for z. JavaScript is probably the purest example of this. Everything is an object, and x.y just looks up 'x' in the object that is the current namespace. Assuming that returns an object rather than nothing, we then try to find 'y' inside of that object. I'm not an Oracle expert, but I am under the impression that the way that Oracle works is closer to that than it is to our read-the-tea-leaves approach. I'm almost positive you're about to tell me that there's no way in the infernal regions that we could make a semantics change of this magnitude, and maybe you're right. But I think our current approach is deeply unsatisfying and seriously counterintuitive. People do not get an error about x.y and think "oh, right, I need to write (x).y so that the parser understands that the name is x rather than x.y and the .y part is field-selection rather than a part of the name itself." They get an error about x.y and say "crap, I guess this syntax isn't supported" and then when you show them that "(x).y" fixes it, they say "why in the world does that fix it?" or "wow, that's dumb." Imagine if we made _ perform string concatenation but also continued to allow it as an identifier character. When we saw a_b without spaces, we'd test for whether there's an a_b variable, and/or whether there are a and b variables, to guess which interpretation was meant. I hope we would all agree that this would be insane language design. Yet that's essentially what we've done with period, and I don't think we can blame that on the SQL standard, because I don't think other systems have this problem. I wonder if anyone knows of another system that works like PostgreSQL in this regard (without sharing code). -- Robert Haas EDB: http://www.enterprisedb.com
Re: Why don't we support external input/output functions for the composite types
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > No other programming language that I know of, and no other database > that I know of, looks at x.y.z and says "ok, well first we have to > figure out whether the object is named x or x.y or x.y.z, and then > after that, we'll use whatever is left over as a field selector." It may indeed be true that nobody but SQL does that, but nonetheless this is exactly what SQL99 requires AFAICT. The reason we use this parenthesis notation is precisely that we didn't want to get into that sort of tea-leaf-reading about how many identifiers mean what. The parens put it on the user to tell us what part of the chain is field selection. Now do you see why I'd prefer to ditch the SQL92-compatibility measures? If we said that the first identifier in a chain must be a correlation name or parameter name, never anything else, it'd be substantially saner. > Yet that's essentially what we've done with period, and I don't think > we can blame that on the SQL standard Yes, we can. Please do not rant further about this until you've read the <identifier chain> section of a recent SQL spec. regards, tom lane
Re: Why don't we support external input/output functions for the composite types
From
Isaac Morland
Date:
On Fri, 26 Apr 2024 at 14:04, Robert Haas <robertmhaas@gmail.com> wrote:
systems have this problem. I wonder if anyone knows of another system
that works like PostgreSQL in this regard (without sharing code).
In Haskell period (".") is used both to form a qualified name (module.name), very similar to our schema.object, and it is also a perfectly normal operator which is defined by the standard prelude as function composition (but you can re-bind it to any object whatsoever). This is disambiguated in a very simple way however: Module names must begin with an uppercase letter while variable names must begin with a lowercase letter.
A related point is that parentheses in Haskell act to group expressions, but they, and commas, are not involved in calling functions: to call a function, just write it to the left of its parameter (and it only has one parameter, officially).
All this might sound weird but it actually works very well in the Haskell context.
Re: Why don't we support external input/output functions for the composite types
From
Robert Haas
Date:
On Fri, Apr 26, 2024 at 2:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > No other programming language that I know of, and no other database > > that I know of, looks at x.y.z and says "ok, well first we have to > > figure out whether the object is named x or x.y or x.y.z, and then > > after that, we'll use whatever is left over as a field selector." > > It may indeed be true that nobody but SQL does that, but nonetheless > this is exactly what SQL99 requires AFAICT. The reason we use this > parenthesis notation is precisely that we didn't want to get into > that sort of tea-leaf-reading about how many identifiers mean what. > The parens put it on the user to tell us what part of the chain > is field selection. I really thought this was just PostgreSQL, not SQL generally, but I just experimented a bit with Oracle on dbfiddle.uk using this example: CREATE TYPE foo AS OBJECT (a number(10), b varchar2(2000)); CREATE TABLE bar (quux foo); INSERT INTO bar VALUES (foo(1, 'one')); SELECT bar.quux, quux, (quux).a, (bar.quux).a FROM bar; This works, but if I delete the parentheses from the last line, then it fails. So evidently my understanding of how this works in other systems is incorrect, or incomplete. I feel like I've encountered cases where we required extra parenthesization that Oracle didn't need, but it's hard to discuss that without examples, and I don't have them right now. > Yes, we can. Please do not rant further about this until you've > read the <identifier chain> section of a recent SQL spec. I'm hurt to see emails that I spent time on characterized as a rant, even if I was wrong on the facts. And I think appealing to the SQL standard is a poor way of trying to end debate on a topic. -- Robert Haas EDB: http://www.enterprisedb.com