Thread: Proposition for autoname columns
Hello Pgsql-hackers, When selecting data from json column it named as '?column?' tucha=# select info->>'suma', docn from document order by id desc limit 5; ?column? | docn ----------+------ 665.97 | 695 513.51 | 632 665.97 | 4804 492.12 | 4315 332.98 | 1302 (5 rows) It would be useful if the name of column will be autoassigned based on name of json key. Like at next query: tucha=# select info->>'suma' as suma, docn from document order by id desc limit 5; suma | docn --------+------ 665.97 | 695 513.51 | 632 665.97 | 4804 492.12 | 4315 332.98 | 1302 (5 rows) Would it be useful this auto assigned name for column from json? -- Best regards, Eugen Konkov
On Mon, Nov 2, 2020 at 05:05:29PM +0200, Eugen Konkov wrote: > Hello Pgsql-hackers, > > When selecting data from json column it named as '?column?' > tucha=# select info->>'suma', docn from document order by id desc limit 5; > ?column? | docn > ----------+------ > 665.97 | 695 > 513.51 | 632 > 665.97 | 4804 > 492.12 | 4315 > 332.98 | 1302 > (5 rows) > > It would be useful if the name of column will be autoassigned based on > name of json key. Like at next query: > > tucha=# select info->>'suma' as suma, docn from document order by id desc limit 5; > suma | docn > --------+------ > 665.97 | 695 > 513.51 | 632 > 665.97 | 4804 > 492.12 | 4315 > 332.98 | 1302 > (5 rows) > > > Would it be useful this auto assigned name for column from json? I think we could do it, but it would only work if the column was output as a single json value, and not a multi-key/value field. I am afraid if we tried to do it, the result would be too inconsistent to be useful. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
On Wed, Nov 11, 2020 at 8:56 AM Bruce Momjian <bruce@momjian.us> wrote:
> It would be useful if the name of column will be autoassigned based on
> name of json key. Like at next query:
>
> tucha=# select info->>'suma' as suma, docn from document order by id desc limit 5;
> suma | docn
> --------+------
> Would it be useful this auto assigned name for column from json?
I think we could do it, but it would only work if the column was output
as a single json value, and not a multi-key/value field. I am afraid if
we tried to do it, the result would be too inconsistent to be useful.
Doing it seems problematic given the nature of SQL and existing means to assign names to columns. If it can be done I don't see how the output value would make any difference. What is being asked for is the simple textual value on the right side of the ->> (and other similar) operators to be converted into a column name. I could image doing this at rewrite time by saying (in parse terms):
info->>'suma to' becomes info->>'suma' AS "suma to" (specifically, add AS, double-quote the literal and stick it after the AS).
If {AS "suma to"} isn't valid syntax for some value of "suma to" just drop the attempt and move on.
I agree that this feature would be useful.
David J.
Hello Bruce, Wednesday, November 11, 2020, 5:56:08 PM, you wrote: > On Mon, Nov 2, 2020 at 05:05:29PM +0200, Eugen Konkov wrote: >> Hello Pgsql-hackers, >> >> When selecting data from json column it named as '?column?' >> tucha=# select info->>'suma', docn from document order by id desc limit 5; >> ?column? | docn >> ----------+------ >> 665.97 | 695 >> 513.51 | 632 >> 665.97 | 4804 >> 492.12 | 4315 >> 332.98 | 1302 >> (5 rows) >> >> It would be useful if the name of column will be autoassigned based on >> name of json key. Like at next query: >> >> tucha=# select info->>'suma' as suma, docn from document order by id desc limit 5; >> suma | docn >> --------+------ >> 665.97 | 695 >> 513.51 | 632 >> 665.97 | 4804 >> 492.12 | 4315 >> 332.98 | 1302 >> (5 rows) >> >> >> Would it be useful this auto assigned name for column from json? > I think we could do it, but it would only work if the column was output > as a single json value, and not a multi-key/value field. I am afraid if > we tried to do it, the result would be too inconsistent to be useful. cool, thank you. -- Best regards, Eugen Konkov
Bruce Momjian <bruce@momjian.us> writes: > On Mon, Nov 2, 2020 at 05:05:29PM +0200, Eugen Konkov wrote: >> Hello Pgsql-hackers, >> >> When selecting data from json column it named as '?column?' >> tucha=# select info->>'suma', docn from document order by id desc limit 5; >> ?column? | docn >> ----------+------ >> 665.97 | 695 >> 513.51 | 632 >> 665.97 | 4804 >> 492.12 | 4315 >> 332.98 | 1302 >> (5 rows) >> >> It would be useful if the name of column will be autoassigned based on >> name of json key. Like at next query: >> >> tucha=# select info->>'suma' as suma, docn from document order by id desc limit 5; >> suma | docn >> --------+------ >> 665.97 | 695 >> 513.51 | 632 >> 665.97 | 4804 >> 492.12 | 4315 >> 332.98 | 1302 >> (5 rows) >> >> >> Would it be useful this auto assigned name for column from json? > > I think we could do it, but it would only work if the column was output > as a single json value, and not a multi-key/value field. I am afraid if > we tried to do it, the result would be too inconsistent to be useful. Could this be done via the support function, so that the top-level operator/function in each select list item can return a suggested column name if the relevant arguments are constants? - ilmari -- - Twitter seems more influential [than blogs] in the 'gets reported in the mainstream press' sense at least. - Matt McLeod - That'd be because the content of a tweet is easier to condense down to a mainstream media article. - Calle Dybedahl
On Thu, Nov 12, 2020 at 12:18:49AM +0000, Dagfinn Ilmari Mannsåker wrote: > Bruce Momjian <bruce@momjian.us> writes: > > I think we could do it, but it would only work if the column was output > > as a single json value, and not a multi-key/value field. I am afraid if > > we tried to do it, the result would be too inconsistent to be useful. > > Could this be done via the support function, so that the top-level > operator/function in each select list item can return a suggested column > name if the relevant arguments are constants? Yes, the user explicitly calling a function would be much easier to predict. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
On Wed, Nov 11, 2020 at 5:56 PM Bruce Momjian <bruce@momjian.us> wrote:
On Thu, Nov 12, 2020 at 12:18:49AM +0000, Dagfinn Ilmari Mannsåker wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > I think we could do it, but it would only work if the column was output
> > as a single json value, and not a multi-key/value field. I am afraid if
> > we tried to do it, the result would be too inconsistent to be useful.
>
> Could this be done via the support function, so that the top-level
> operator/function in each select list item can return a suggested column
> name if the relevant arguments are constants?
Yes, the user explicitly calling a function would be much easier to
predict.
For the user an operator and a function are different ways to invoke the same underlying thing using different syntax. I'm not seeing how this syntax difference makes this any easier to implement for explicit function invocation compared to operator function invocation.
David J.
On 11/11/20 7:55 PM, Bruce Momjian wrote: > On Thu, Nov 12, 2020 at 12:18:49AM +0000, Dagfinn Ilmari Mannsåker wrote: >> Bruce Momjian <bruce@momjian.us> writes: >>> I think we could do it, but it would only work if the column was output >>> as a single json value, and not a multi-key/value field. I am afraid if >>> we tried to do it, the result would be too inconsistent to be useful. >> Could this be done via the support function, so that the top-level >> operator/function in each select list item can return a suggested column >> name if the relevant arguments are constants? > Yes, the user explicitly calling a function would be much easier to > predict. > I suspect this is doomed to failure. There is no guarantee that the path expression is going to be static or constant across rows. Say you have this table: x: foo, j: {"foo": 1, "bar": 2} x: bar j: {"foo": 3, "bar": 4} and you say: select j->>x from mytable; What should the column be named? I think we'd be trying to manage a set of corner cases, and all because someone didn't want to put "as foo" in their query. And if we generate a column name in some cases and not in others there will be complaints of inconsistency. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Hello Andrew, Thursday, November 12, 2020, 3:19:39 PM, you wrote: > On 11/11/20 7:55 PM, Bruce Momjian wrote: >> On Thu, Nov 12, 2020 at 12:18:49AM +0000, Dagfinn Ilmari Mannsåker wrote: >>> Bruce Momjian <bruce@momjian.us> writes: >>>> I think we could do it, but it would only work if the column was output >>>> as a single json value, and not a multi-key/value field. I am afraid if >>>> we tried to do it, the result would be too inconsistent to be useful. >>> Could this be done via the support function, so that the top-level >>> operator/function in each select list item can return a suggested column >>> name if the relevant arguments are constants? >> Yes, the user explicitly calling a function would be much easier to >> predict. >> > I suspect this is doomed to failure. There is no guarantee that the path > expression is going to be static or constant across rows. Say you have > this table: > x: foo, j: {"foo": 1, "bar": 2} > x: bar j: {"foo": 3, "bar": 4} > and you say: > select j->>x from mytable; > What should the column be named? Suppose it should be named 'as x' > I think we'd be trying to manage a set of corner cases, and all because > someone didn't want to put "as foo" in their query. And if we generate a > column name in some cases and not in others there will be complaints of > inconsistency. > cheers > andrew > -- > Andrew Dunstan > EDB: https://www.enterprisedb.com -- Best regards, Eugen Konkov
On Thu, Nov 12, 2020 at 7:18 AM Eugen Konkov <kes-kes@yandex.ru> wrote:
Hello Andrew,
Thursday, November 12, 2020, 3:19:39 PM, you wrote:
> On 11/11/20 7:55 PM, Bruce Momjian wrote:
> select j->>x from mytable;
> What should the column be named?
Suppose it should be named 'as x'
+1
> I think we'd be trying to manage a set of corner cases, and all because
> someone didn't want to put "as foo" in their query. And if we generate a
> column name in some cases and not in others there will be complaints of
> inconsistency.
Yes, this is suggesting a behavior that is contrary to (but not prohibited by) the natural expression and expectations of SQL. That said, we already take a function's name and use it to specify the name of it output column as opposed to using "?column?" and requiring a user to apply a specific alias. This is only a step beyond that, choosing the default name for an operator's output column based upon not the name of the operator (or its underlying function) but based upon its one (and only possible) right-hand argument. It is purely a user convenience feature and can be rejected on that grounds but I'm not seeing any fundamental issue with only having some operator combinations doing this. It's nice when it works and you are no worse off than today when it doesn't.
David J.
On 11/12/20 9:14 AM, Eugen Konkov wrote: > Hello Andrew, > > Thursday, November 12, 2020, 3:19:39 PM, you wrote: > > >> On 11/11/20 7:55 PM, Bruce Momjian wrote: >>> On Thu, Nov 12, 2020 at 12:18:49AM +0000, Dagfinn Ilmari Mannsåker wrote: >>>> Bruce Momjian <bruce@momjian.us> writes: >>>>> I think we could do it, but it would only work if the column was output >>>>> as a single json value, and not a multi-key/value field. I am afraid if >>>>> we tried to do it, the result would be too inconsistent to be useful. >>>> Could this be done via the support function, so that the top-level >>>> operator/function in each select list item can return a suggested column >>>> name if the relevant arguments are constants? >>> Yes, the user explicitly calling a function would be much easier to >>> predict. >>> >> I suspect this is doomed to failure. There is no guarantee that the path >> expression is going to be static or constant across rows. Say you have >> this table: >> x: foo, j: {"foo": 1, "bar": 2} >> x: bar j: {"foo": 3, "bar": 4} >> and you say: >> select j->>x from mytable; >> What should the column be named? > Suppose it should be named 'as x' So if we then say: select x, j->>x from mytable; you want both result columns named x? That seems like a recipe for serious confusion. I really don't think this proposal has been properly thought through. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
čt 12. 11. 2020 v 16:59 odesílatel Andrew Dunstan <andrew@dunslane.net> napsal:
On 11/12/20 9:14 AM, Eugen Konkov wrote:
> Hello Andrew,
>
> Thursday, November 12, 2020, 3:19:39 PM, you wrote:
>
>
>> On 11/11/20 7:55 PM, Bruce Momjian wrote:
>>> On Thu, Nov 12, 2020 at 12:18:49AM +0000, Dagfinn Ilmari Mannsåker wrote:
>>>> Bruce Momjian <bruce@momjian.us> writes:
>>>>> I think we could do it, but it would only work if the column was output
>>>>> as a single json value, and not a multi-key/value field. I am afraid if
>>>>> we tried to do it, the result would be too inconsistent to be useful.
>>>> Could this be done via the support function, so that the top-level
>>>> operator/function in each select list item can return a suggested column
>>>> name if the relevant arguments are constants?
>>> Yes, the user explicitly calling a function would be much easier to
>>> predict.
>>>
>> I suspect this is doomed to failure. There is no guarantee that the path
>> expression is going to be static or constant across rows. Say you have
>> this table:
>> x: foo, j: {"foo": 1, "bar": 2}
>> x: bar j: {"foo": 3, "bar": 4}
>> and you say:
>> select j->>x from mytable;
>> What should the column be named?
> Suppose it should be named 'as x'
So if we then say:
select x, j->>x from mytable;
you want both result columns named x? That seems like a recipe for
serious confusion. I really don't think this proposal has been properly
thought through.
Why? It is consistent - you will get a value of key x, and anybody expects, so value should be different.
Regards
Pavel
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
On Thu, Nov 12, 2020 at 8:59 AM Andrew Dunstan <andrew@dunslane.net> wrote:
So if we then say:
select x, j->>x from mytable;
you want both result columns named x? That seems like a recipe for
serious confusion. I really don't think this proposal has been properly
thought through.
IMO It no worse than today's:
select count(*), count(*) from (values (1), (2)) vals (v);
count | count
2 | 2
David J.
On 11/12/20 11:12 AM, David G. Johnston wrote: > On Thu, Nov 12, 2020 at 8:59 AM Andrew Dunstan <andrew@dunslane.net > <mailto:andrew@dunslane.net>> wrote: > > > > So if we then say: > > > select x, j->>x from mytable; > > > you want both result columns named x? That seems like a recipe for > serious confusion. I really don't think this proposal has been > properly > thought through. > > > IMO It no worse than today's: > > select count(*), count(*) from (values (1), (2)) vals (v); > count | count > 2 | 2 > I guess the difference here is that there's an extra level of indirection. So select x, j->>'x', j->>x from mytable would have 3 result columns all named x. cheers andrew
On Thu, Nov 12, 2020 at 11:32:49AM -0500, Andrew Dunstan wrote: > On 11/12/20 11:12 AM, David G. Johnston wrote: > > IMO It no worse than today's: > > > > select count(*), count(*) from (values (1), (2)) vals (v); > > count | count > > 2 | 2 > > > > > I guess the difference here is that there's an extra level of > indirection. So > > select x, j->>'x', j->>x from mytable > > would have 3 result columns all named x. Yeah, I feel it would have to be something a user specifically asks for, and we would have to say it would be the first or a random match of one of the keys. Ultimately, it might be so awkward as to be useless. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
On Thu, Nov 12, 2020 at 9:32 AM Andrew Dunstan <andrew@dunslane.net> wrote:
On 11/12/20 11:12 AM, David G. Johnston wrote:
> On Thu, Nov 12, 2020 at 8:59 AM Andrew Dunstan <andrew@dunslane.net
> <mailto:andrew@dunslane.net>> wrote:
>
>
>
> So if we then say:
>
>
> select x, j->>x from mytable;
>
>
> you want both result columns named x? That seems like a recipe for
> serious confusion. I really don't think this proposal has been
> properly
> thought through.
>
>
> IMO It no worse than today's:
>
> select count(*), count(*) from (values (1), (2)) vals (v);
> count | count
> 2 | 2
>
I guess the difference here is that there's an extra level of
indirection. So
select x, j->>'x', j->>x from mytable
would have 3 result columns all named x.
I totally missed the variable reference there - only two of those become "x", the variable reference stays un-rewritten and thus results in "?column?", similar to today:
select count(*), count(*) +1 from (values (1), (2)) vals (v);
count | ?column?
2 | 2
The query rewriter would only rewrite these expressions and provide an expression-related explicit alias clause if the expression is a single operator (same as single function today) and the right-hand side of the operator is a constant (meaning the constant is a reasonable representation of every output value that is going to appear in the result column). If the RHS is a variable then there is no good name that is known to cover all output values and thus ?column? (i.e., do not rewrite/provide an alias clause) is an appropriate choice.
My concerns in this area involve stored views and ruleutils, dump/reload by extension. Greenfield, this would have been nice, and worth the minimal complexity given its usefulness in the common case, but is it useful enough to introduce a whole new default naming mechanism and dealing with dump/restore concerns?
David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes: > The query rewriter would only rewrite these expressions and provide an > expression-related explicit alias clause if the expression is a single > operator (same as single function today) and the right-hand side of the > operator is a constant (meaning the constant is a reasonable representation > of every output value that is going to appear in the result column). I haven't been paying too close attention to this thread, but it seems like there is a lot of misapprehension here about how this could reasonably be implemented. There is zero (not epsilon, but zero) chance of changing column aliases at rewrite time. Those have to be assigned in the parser, else we will not understand how to resolve references to sub-select output columns. Specifically it has to happen in FigureColname(), which means that resolving non-constant arguments to constants isn't terribly practical. Actually, since FigureColname() works on the raw parse tree, I'm not even sure how you could make this happen in that context, unless you're willing to say that "j ->> 'x'" resolves as "x" just based on the name of the operator, without any info about its semantics. That doesn't seem very cool. Now, in a quick look at the callers, it looks like it'd be no problem from the callers' standpoint to switch things around to do colname selection on the parsed tree instead, ie the existing choice is for FigureColname's benefit not the callers'. But it'd likely cost a good deal to do it the other way, since now FigureColname would need to perform catalog lookups to get column and function names. Maybe you could do something like passing *both* trees to FigureColname, and let it obtain the actual operator OID from the parsed tree when the raw tree contains AEXPR_OP. But the recursion in FigureColname would be difficult to manage because the two trees often don't match one-to-one. On the whole, I'm on the side of the people who don't want to change this. The implementation cost seems likely to greatly outweigh the value, plus it feels more like a wart than a feature. regards, tom lane
On Thu, Nov 12, 2020 at 01:52:11PM -0500, Tom Lane wrote: > On the whole, I'm on the side of the people who don't want to change this. > The implementation cost seems likely to greatly outweigh the value, plus > it feels more like a wart than a feature. I think we can mark this as, "We thought about it, and we decided it is probably not a good idea." -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
On Thursday, November 12, 2020, Bruce Momjian <bruce@momjian.us> wrote:
On Thu, Nov 12, 2020 at 01:52:11PM -0500, Tom Lane wrote:
> On the whole, I'm on the side of the people who don't want to change this.
> The implementation cost seems likely to greatly outweigh the value, plus
> it feels more like a wart than a feature.
I think we can mark this as, "We thought about it, and we decided it is
probably not a good idea."
+1
David J.
On 2020-Nov-12, Tom Lane wrote: > On the whole, I'm on the side of the people who don't want to change this. > The implementation cost seems likely to greatly outweigh the value, plus > it feels more like a wart than a feature. I think if Eugen wants to spend some time with it and see how it could be implemented, then sent a patch for consideration, then we could make a better informed decision. My own opinion is that it's not worth the trouble, but I'd rather us not stand in his way if he wants to try (With disclaimer that we might end up not liking the patch, of course).
On Thu, Nov 12, 2020 at 04:30:15PM -0300, Álvaro Herrera wrote: > On 2020-Nov-12, Tom Lane wrote: > > > On the whole, I'm on the side of the people who don't want to change this. > > The implementation cost seems likely to greatly outweigh the value, plus > > it feels more like a wart than a feature. > > I think if Eugen wants to spend some time with it and see how it could > be implemented, then sent a patch for consideration, then we could make > a better informed decision. My own opinion is that it's not worth the > trouble, but I'd rather us not stand in his way if he wants to try > (With disclaimer that we might end up not liking the patch, of course). I think he would be better outlining how he wants it to behave before even working on a patch; from our TODO list: Desirability -> Design -> Implement -> Test -> Review -> Commit -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
> On 2020-Nov-12, Tom Lane wrote: >> On the whole, I'm on the side of the people who don't want to change this. >> The implementation cost seems likely to greatly outweigh the value, plus >> it feels more like a wart than a feature. > I think if Eugen wants to spend some time with it and see how it could > be implemented, then sent a patch for consideration, then we could make > a better informed decision. My own opinion is that it's not worth the > trouble, but I'd rather us not stand in his way if he wants to try > (With disclaimer that we might end up not liking the patch, of course). Sorry, I am not C/C++ programmist and do not imagine how to start to patch. I do not know internals of PG. The only useful thing from me is just that idea to make world better. I suppose initially there were only ?column?, later names were implemented for count, sum etc But it will be cool if PG will do step further and name sum( a ) as sum_a instead of just sum The purpose of this proposition is not about correct name generation, the purpose to get more distinct default names: ?column?, ?column?, ?column?, ?column?, ?column?, ?column?, ?column?, ?count?, ?count?, ?count?, ?sum?, ?sum?, ?sum?, ?sum? ?count_a?, ?count_b?, ?count_c?, ?sum_a?, ?sum_b?, ?sum_c?, ?sum_d? Notice, that latest is more robust that first ;-) I suppose we just ignore comlex cases and left them as they are current. We could try some very very small step at the direction to improve default names and see feed back from many users how it is useful or not. Then we can decide it worth or not to implement whole system for default name generation. Unfortunately I am not judje at which level those should occur: parser, analiser or so. I just does not understand those things =( Thank you. -- Best regards, Eugen Konkov