Thread: patch: bytea_agg
Hello this patch adds a bytea_agg aggregation. It allow fast bytea concatetation. Regards Pavel Stehule
Attachment
On Wed, Dec 21, 2011 at 5:04 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > this patch adds a bytea_agg aggregation. > > It allow fast bytea concatetation. Looks fine to me. I'll commit this, barring objections. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Dec 22, 2011 at 11:49 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Dec 21, 2011 at 5:04 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> this patch adds a bytea_agg aggregation. >> >> It allow fast bytea concatetation. > > Looks fine to me. I'll commit this, barring objections. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On ons, 2011-12-21 at 11:04 +0100, Pavel Stehule wrote: > this patch adds a bytea_agg aggregation. > > It allow fast bytea concatetation. Why not call it string_agg? All the function names are the same between text and bytea (e.g., ||, substr, position, length). It would be nice not to introduce arbitrary differences.
Hello 2011/12/23 Peter Eisentraut <peter_e@gmx.net>: > On ons, 2011-12-21 at 11:04 +0100, Pavel Stehule wrote: >> this patch adds a bytea_agg aggregation. >> >> It allow fast bytea concatetation. > > Why not call it string_agg? All the function names are the same between > text and bytea (e.g., ||, substr, position, length). It would be nice > not to introduce arbitrary differences. My opinion is not too strong. I don't think so using string_agg is good name (for bytea_agg) - as minimal (and only one) reason is different API - there is no support for delimiter. If I remember well discussion about string_agg, where delimiter is not optimal, there is request for immutable interface for aggregates - there was a issue with ORDER clause. So bytea_agg is good name. Regards Pavel >
Hello 2011/12/23 Peter Eisentraut <peter_e@gmx.net>: > On ons, 2011-12-21 at 11:04 +0100, Pavel Stehule wrote: >> this patch adds a bytea_agg aggregation. >> >> It allow fast bytea concatetation. > > Why not call it string_agg? All the function names are the same between > text and bytea (e.g., ||, substr, position, length). It would be nice > not to introduce arbitrary differences. My opinion is not strong. I don't think so using string_agg is good name (- as minimal (and only one) reason is different API - there is no support for delimiter. If I remember well discussion about string_agg, where delimiter is not optimal, there is request for immutable interface for aggregates - there was a issue with ORDER clause. So bytea_agg is good name. Regards Pavel >
On Fri, Dec 23, 2011 at 12:51 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > On ons, 2011-12-21 at 11:04 +0100, Pavel Stehule wrote: >> this patch adds a bytea_agg aggregation. >> >> It allow fast bytea concatetation. > > Why not call it string_agg? All the function names are the same between > text and bytea (e.g., ||, substr, position, length). It would be nice > not to introduce arbitrary differences. Well, because it doesn't operate on strings. I argued when we added string_agg that it ought to be called concat_agg, or something like that, but I got shouted down. So now here we are. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On fre, 2011-12-23 at 13:30 -0500, Robert Haas wrote: > On Fri, Dec 23, 2011 at 12:51 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > > On ons, 2011-12-21 at 11:04 +0100, Pavel Stehule wrote: > >> this patch adds a bytea_agg aggregation. > >> > >> It allow fast bytea concatetation. > > > > Why not call it string_agg? All the function names are the same between > > text and bytea (e.g., ||, substr, position, length). It would be nice > > not to introduce arbitrary differences. > > Well, because it doesn't operate on strings. Sure, binary strings. Both the SQL standard and the PostgreSQL documentation use that term.
On Fri, Dec 23, 2011 at 2:16 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > On fre, 2011-12-23 at 13:30 -0500, Robert Haas wrote: >> On Fri, Dec 23, 2011 at 12:51 PM, Peter Eisentraut <peter_e@gmx.net> wrote: >> > On ons, 2011-12-21 at 11:04 +0100, Pavel Stehule wrote: >> >> this patch adds a bytea_agg aggregation. >> >> >> >> It allow fast bytea concatetation. >> > >> > Why not call it string_agg? All the function names are the same between >> > text and bytea (e.g., ||, substr, position, length). It would be nice >> > not to introduce arbitrary differences. >> >> Well, because it doesn't operate on strings. > > Sure, binary strings. Both the SQL standard and the PostgreSQL > documentation use that term. I'm unimpressed by that argument, but let's see what other people think. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote: > Peter Eisentraut <peter_e@gmx.net> wrote: >> Robert Haas wrote: >>> Peter Eisentraut <peter_e@gmx.net> wrote: >>>> On ons, 2011-12-21 at 11:04 +0100, Pavel Stehule wrote: >>>>> this patch adds a bytea_agg aggregation. >>>>> >>>>> It allow fast bytea concatetation. >>>> >>>> Why not call it string_agg? All the function names are the >>>> same between text and bytea (e.g., ||, substr, position, >>>> length). It would be nice not to introduce arbitrary >>>> differences. >>> >>> Well, because it doesn't operate on strings. >> >> Sure, binary strings. Both the SQL standard and the PostgreSQL >> documentation use that term. > > I'm unimpressed by that argument, but let's see what other people > think. I, for one, try to be consistent about saying "character strings" when that is what I mean. Since at least the SQL-92 standard there have been both "character strings" and "bit strings", with a certain amount of symmetry in how they are handled. I don't remember when binary strings were introduced, but that is the standard terminology. There is, for example, a standard substring function for binary strings. -Kevin
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Dec 23, 2011 at 2:16 PM, Peter Eisentraut <peter_e@gmx.net> wrote: >> On fre, 2011-12-23 at 13:30 -0500, Robert Haas wrote: >>> Well, because it doesn't operate on strings. >> Sure, binary strings. �Both the SQL standard and the PostgreSQL >> documentation use that term. > I'm unimpressed by that argument, but let's see what other people think. I generally agree with Peter: string_agg makes sense here. The only real argument against it is Pavel's point that he didn't include a delimiter parameter, but that just begs the question why not. It seems at least plausible that there would be use-cases for it. So I think we should try to make this as much like the text case as possible. regards, tom lane
Hello 2011/12/23 Tom Lane <tgl@sss.pgh.pa.us>: > Robert Haas <robertmhaas@gmail.com> writes: >> On Fri, Dec 23, 2011 at 2:16 PM, Peter Eisentraut <peter_e@gmx.net> wrote: >>> On fre, 2011-12-23 at 13:30 -0500, Robert Haas wrote: >>>> Well, because it doesn't operate on strings. > >>> Sure, binary strings. Both the SQL standard and the PostgreSQL >>> documentation use that term. > >> I'm unimpressed by that argument, but let's see what other people think. > > I generally agree with Peter: string_agg makes sense here. The only > real argument against it is Pavel's point that he didn't include a > delimiter parameter, but that just begs the question why not. It > seems at least plausible that there would be use-cases for it. I don't know a real usage for bytea delimiter. Probably there is, but I expect so most often use case will be without delimiter. And when it is necessary, then || should be used. I see two ways: a) use it bytea_agg as it now b) use a string_agg with delimiter, that will be usually empty. Using a string_agg for bytea is not too intuitive (but has sense) - maybe we can introduce a synonym type for bytea - like "binary string" or "bstring". Regards Pavel > > So I think we should try to make this as much like the text case as > possible. > > regards, tom lane
Pavel Stehule <pavel.stehule@gmail.com> wrote: > maybe we can introduce a synonym type for bytea - like "binary > string" or "bstring". The standard mentions these names for binary strings: BINARY, BINARY VARYING, or BINARY LARGE OBJECT which have a certain symmetry with: CHARACTER, CHARACTER VARYING, and CHARACTER LARGE OBJECT -Kevin
Excerpts from Pavel Stehule's message of vie dic 23 18:36:11 -0300 2011: > Hello > > 2011/12/23 Tom Lane <tgl@sss.pgh.pa.us>: > > I generally agree with Peter: string_agg makes sense here. The only > > real argument against it is Pavel's point that he didn't include a > > delimiter parameter, but that just begs the question why not. It > > seems at least plausible that there would be use-cases for it. > > I don't know a real usage for bytea delimiter. Probably there is, but > I expect so most often use case will be without delimiter. Well, sometimes bytea is used to store character strings when the encoding information is to be handled by the app instead of having Postgres know it, for various reasons. I haven't seen any of those cases yet that would use string_add(bytea) but I wouldn't foreclose that possibility. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Fri, Dec 23, 2011 at 12:30 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Well, because it doesn't operate on strings. > > I argued when we added string_agg that it ought to be called > concat_agg, or something like that, but I got shouted down. So now > here we are. +1. Using the input type names to name the function is a mistake and should be stopped...enough. It's verbose and unnecessary...everything else in sql is heavily overloaded (we don't have int_max() and float_max()). merlin
On fre, 2011-12-23 at 19:51 +0200, Peter Eisentraut wrote: > On ons, 2011-12-21 at 11:04 +0100, Pavel Stehule wrote: > > this patch adds a bytea_agg aggregation. > > > > It allow fast bytea concatetation. > > Why not call it string_agg? All the function names are the same between > text and bytea (e.g., ||, substr, position, length). It would be nice > not to introduce arbitrary differences. Here is a patch to do the renaming. As it stands, it fails the opr_sanity regression test, because that complains that there are now two aggregate functions string_agg with different number of arguments. It seems to me that that test should really only complain if the common argument types of the two aggregates are the same, correct?
Attachment
Peter Eisentraut <peter_e@gmx.net> writes: > On fre, 2011-12-23 at 19:51 +0200, Peter Eisentraut wrote: >> On ons, 2011-12-21 at 11:04 +0100, Pavel Stehule wrote: >>> this patch adds a bytea_agg aggregation. >> Why not call it string_agg? > Here is a patch to do the renaming. As it stands, it fails the > opr_sanity regression test, because that complains that there are now > two aggregate functions string_agg with different number of arguments. > It seems to me that that test should really only complain if the common > argument types of the two aggregates are the same, correct? Uh, no. That test is there for good and sufficient reasons, as per its comment: -- Check that there are not aggregates with the same name and different -- numbers of arguments. While not technically wrong, we have a project policy -- to avoid this because it opens the door for confusion in connection with -- ORDER BY: novices frequently put the ORDER BY in the wrong place. -- See the fate of the single-argument form of string_agg() for history. The renaming you propose would only be acceptable to those who have forgotten that history. I haven't. regards, tom lane
On Wed, Apr 4, 2012 at 11:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The renaming you propose would only be acceptable to those who have > forgotten that history. I haven't. I had. I looked it up http://archives.postgresql.org/pgsql-bugs/2010-08/msg00044.php That was quite a thread. -- greg
On ons, 2012-04-04 at 18:59 -0400, Tom Lane wrote: > > >> Why not call it string_agg? > > > Here is a patch to do the renaming. As it stands, it fails the > > opr_sanity regression test, because that complains that there are now > > two aggregate functions string_agg with different number of arguments. > > It seems to me that that test should really only complain if the common > > argument types of the two aggregates are the same, correct? > > Uh, no. That test is there for good and sufficient reasons, as per its > comment: > > -- Check that there are not aggregates with the same name and different > -- numbers of arguments. While not technically wrong, we have a project policy > -- to avoid this because it opens the door for confusion in connection with > -- ORDER BY: novices frequently put the ORDER BY in the wrong place. > -- See the fate of the single-argument form of string_agg() for history. > > The renaming you propose would only be acceptable to those who have > forgotten that history. I haven't. I had reviewed that thread very carefully, but I'm not sure it applies. The issue was that we don't want aggregates with optional second argument, because "novices" could confuse agg(a, b ORDER BY c) with agg(a ORDER BY b, c) -- wrong without the error being flagged. But that doesn't apply if the first argument has different types. Erroneously calling agg(textdatum ORDER BY textdatum, sthelse) will not result in a call to agg(bytea). (Unless the textdatum is really an unknown literal, but that would be silly.) Nevertheless, the problem would now be that adding string_agg(bytea) would effectively forbid adding string_agg(bytea, delim) in the future. So making a two-argument string_agg(bytea, bytea) now seems like the best solution anyway. (This applies independently of the function renaming, actually.)
Peter Eisentraut <peter_e@gmx.net> writes: > On ons, 2012-04-04 at 18:59 -0400, Tom Lane wrote: >> Uh, no. That test is there for good and sufficient reasons, as per its >> comment: > I had reviewed that thread very carefully, but I'm not sure it applies. > The issue was that we don't want aggregates with optional second > argument, because "novices" could confuse > agg(a, b ORDER BY c) > with > agg(a ORDER BY b, c) -- wrong > without the error being flagged. > But that doesn't apply if the first argument has different types. > Erroneously calling agg(textdatum ORDER BY textdatum, sthelse) will not > result in a call to agg(bytea). The point of the policy is to be sure that we will throw a specific error message with a suitable hint when a malformed call is made. If there are alternative aggregates in the catalogs that have the potential to "capture" such a call, then we risk the wrong thing happening. It appears that the lack of any implicit cast from bytea to text would prevent that today, but I still think this proposal boxes us in to an unreasonable degree compared to the benefit. For example, this would greatly restrict our ability to throw "ambiguous aggregate" messages. > Nevertheless, the problem would now be that adding string_agg(bytea) > would effectively forbid adding string_agg(bytea, delim) in the future. > So making a two-argument string_agg(bytea, bytea) now seems like the > best solution anyway. (This applies independently of the function > renaming, actually.) Hm. So are you now suggesting we should get rid of one-argument bytea_agg and replace it with two-argument string_agg(bytea,bytea)? I could support that, since we've not released bytea_agg yet. regards, tom lane
On lör, 2012-04-07 at 10:38 -0400, Tom Lane wrote: > > Nevertheless, the problem would now be that adding string_agg(bytea) > > would effectively forbid adding string_agg(bytea, delim) in the > future. > > So making a two-argument string_agg(bytea, bytea) now seems like the > > best solution anyway. (This applies independently of the function > > renaming, actually.) > > Hm. So are you now suggesting we should get rid of one-argument > bytea_agg and replace it with two-argument string_agg(bytea,bytea)? > I could support that, since we've not released bytea_agg yet. Yes, that looks like the best solution. Here is a patch for that.
Attachment
Peter Eisentraut <peter_e@gmx.net> writes: > On lör, 2012-04-07 at 10:38 -0400, Tom Lane wrote: >> Hm. So are you now suggesting we should get rid of one-argument >> bytea_agg and replace it with two-argument string_agg(bytea,bytea)? >> I could support that, since we've not released bytea_agg yet. > Yes, that looks like the best solution. Here is a patch for that. Looks sane in a quick once-over, except for the documentation entry. I'm not really thrilled with "text, text or bytea, bytea" because it seems easy to misparse. Moreover, as written the entry claims that the return type is text either way, which is wrong. You could fix the latter by writing "same as argument data type", but I wonder whether it'd be better to make separate table entries for the two forms. regards, tom lane