Thread: patch: bytea_agg

patch: bytea_agg

From

Pavel Stehule

Date:

21 December 2011, 06:05:14

Hello

this patch adds a bytea_agg aggregation.

It allow fast bytea concatetation.

Regards

Pavel Stehule

Attachment

bytea_agg.diff

Re: patch: bytea_agg

From

Robert Haas

Date:

22 December 2011, 12:49:42

On Wed, Dec 21, 2011 at 5:04 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> this patch adds a bytea_agg aggregation.
>
> It allow fast bytea concatetation.

Looks fine to me.  I'll commit this, barring objections.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: patch: bytea_agg

From

Robert Haas

Date:

23 December 2011, 09:43:09

On Thu, Dec 22, 2011 at 11:49 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Dec 21, 2011 at 5:04 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> this patch adds a bytea_agg aggregation.
>>
>> It allow fast bytea concatetation.
>
> Looks fine to me.  I'll commit this, barring objections.

Committed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: patch: bytea_agg

From

Peter Eisentraut

Date:

23 December 2011, 13:51:56

On ons, 2011-12-21 at 11:04 +0100, Pavel Stehule wrote:
> this patch adds a bytea_agg aggregation.
> 
> It allow fast bytea concatetation.

Why not call it string_agg?  All the function names are the same between
text and bytea (e.g., ||, substr, position, length).  It would be nice
not to introduce arbitrary differences.

Re: patch: bytea_agg

From

Pavel Stehule

Date:

23 December 2011, 14:24:07

Hello

2011/12/23 Peter Eisentraut <peter_e@gmx.net>:
> On ons, 2011-12-21 at 11:04 +0100, Pavel Stehule wrote:
>> this patch adds a bytea_agg aggregation.
>>
>> It allow fast bytea concatetation.
>
> Why not call it string_agg?  All the function names are the same between
> text and bytea (e.g., ||, substr, position, length).  It would be nice
> not to introduce arbitrary differences.

My opinion is not too strong. I don't think so using string_agg is
good name (for bytea_agg) - as minimal (and only one) reason is
different API - there is no support for delimiter. If I remember well
discussion about string_agg, where delimiter is not optimal, there is
request for immutable interface for aggregates - there was a issue
with ORDER clause. So bytea_agg is good name.

Regards

Pavel

>

Re: patch: bytea_agg

From

Pavel Stehule

Date:

23 December 2011, 14:26:02

Hello

2011/12/23 Peter Eisentraut <peter_e@gmx.net>:
> On ons, 2011-12-21 at 11:04 +0100, Pavel Stehule wrote:
>> this patch adds a bytea_agg aggregation.
>>
>> It allow fast bytea concatetation.
>
> Why not call it string_agg?  All the function names are the same between
> text and bytea (e.g., ||, substr, position, length).  It would be nice
> not to introduce arbitrary differences.

My opinion is not strong. I don't think so using string_agg is good
name (- as minimal (and only one) reason is different API - there is
no support for delimiter. If I remember well discussion about
string_agg, where delimiter is not optimal, there is request for
immutable interface for aggregates - there was a issue with ORDER
clause. So bytea_agg is good name.

Regards

Pavel

>

Re: patch: bytea_agg

From

Robert Haas

Date:

23 December 2011, 14:30:47

On Fri, Dec 23, 2011 at 12:51 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> On ons, 2011-12-21 at 11:04 +0100, Pavel Stehule wrote:
>> this patch adds a bytea_agg aggregation.
>>
>> It allow fast bytea concatetation.
>
> Why not call it string_agg?  All the function names are the same between
> text and bytea (e.g., ||, substr, position, length).  It would be nice
> not to introduce arbitrary differences.

Well, because it doesn't operate on strings.

I argued when we added string_agg that it ought to be called
concat_agg, or something like that, but I got shouted down.  So now
here we are.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: patch: bytea_agg

From

Peter Eisentraut

Date:

23 December 2011, 15:16:26

On fre, 2011-12-23 at 13:30 -0500, Robert Haas wrote:
> On Fri, Dec 23, 2011 at 12:51 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> > On ons, 2011-12-21 at 11:04 +0100, Pavel Stehule wrote:
> >> this patch adds a bytea_agg aggregation.
> >>
> >> It allow fast bytea concatetation.
> >
> > Why not call it string_agg?  All the function names are the same between
> > text and bytea (e.g., ||, substr, position, length).  It would be nice
> > not to introduce arbitrary differences.
> 
> Well, because it doesn't operate on strings.

Sure, binary strings.  Both the SQL standard and the PostgreSQL
documentation use that term.

Re: patch: bytea_agg

From

Robert Haas

Date:

23 December 2011, 15:19:51

On Fri, Dec 23, 2011 at 2:16 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> On fre, 2011-12-23 at 13:30 -0500, Robert Haas wrote:
>> On Fri, Dec 23, 2011 at 12:51 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
>> > On ons, 2011-12-21 at 11:04 +0100, Pavel Stehule wrote:
>> >> this patch adds a bytea_agg aggregation.
>> >>
>> >> It allow fast bytea concatetation.
>> >
>> > Why not call it string_agg?  All the function names are the same between
>> > text and bytea (e.g., ||, substr, position, length).  It would be nice
>> > not to introduce arbitrary differences.
>>
>> Well, because it doesn't operate on strings.
>
> Sure, binary strings.  Both the SQL standard and the PostgreSQL
> documentation use that term.

I'm unimpressed by that argument, but let's see what other people think.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: patch: bytea_agg

From

"Kevin Grittner"

Date:

23 December 2011, 15:35:46

Robert Haas <robertmhaas@gmail.com> wrote:
> Peter Eisentraut <peter_e@gmx.net> wrote:
>> Robert Haas wrote:
>>> Peter Eisentraut <peter_e@gmx.net> wrote:
>>>> On ons, 2011-12-21 at 11:04 +0100, Pavel Stehule wrote:
>>>>> this patch adds a bytea_agg aggregation.
>>>>>
>>>>> It allow fast bytea concatetation.
>>>>
>>>> Why not call it string_agg?  All the function names are the
>>>> same between text and bytea (e.g., ||, substr, position,
>>>> length).  It would be nice not to introduce arbitrary
>>>> differences.
>>>
>>> Well, because it doesn't operate on strings.
>>
>> Sure, binary strings.  Both the SQL standard and the PostgreSQL
>> documentation use that term.
> 
> I'm unimpressed by that argument, but let's see what other people
> think.
I, for one, try to be consistent about saying "character strings"
when that is what I mean.  Since at least the SQL-92 standard there
have been both "character strings" and "bit strings", with a certain
amount of symmetry in how they are handled.   I don't remember when
binary strings were introduced, but that is the standard
terminology.  There is, for example, a standard substring function
for binary strings.
-Kevin

Re: patch: bytea_agg

From

Tom Lane

Date:

23 December 2011, 17:20:17

Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Dec 23, 2011 at 2:16 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
>> On fre, 2011-12-23 at 13:30 -0500, Robert Haas wrote:
>>> Well, because it doesn't operate on strings.

>> Sure, binary strings. �Both the SQL standard and the PostgreSQL
>> documentation use that term.

> I'm unimpressed by that argument, but let's see what other people think.

I generally agree with Peter: string_agg makes sense here.  The only
real argument against it is Pavel's point that he didn't include a
delimiter parameter, but that just begs the question why not.  It
seems at least plausible that there would be use-cases for it.

So I think we should try to make this as much like the text case as
possible.
        regards, tom lane

Re: patch: bytea_agg

From

Pavel Stehule

Date:

23 December 2011, 17:37:09

Hello

2011/12/23 Tom Lane <tgl@sss.pgh.pa.us>:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, Dec 23, 2011 at 2:16 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
>>> On fre, 2011-12-23 at 13:30 -0500, Robert Haas wrote:
>>>> Well, because it doesn't operate on strings.
>
>>> Sure, binary strings.  Both the SQL standard and the PostgreSQL
>>> documentation use that term.
>
>> I'm unimpressed by that argument, but let's see what other people think.
>
> I generally agree with Peter: string_agg makes sense here.  The only
> real argument against it is Pavel's point that he didn't include a
> delimiter parameter, but that just begs the question why not.  It
> seems at least plausible that there would be use-cases for it.

I don't know a real usage for bytea delimiter. Probably there is, but
I expect so most often use case will be without delimiter. And when it
is necessary, then || should be used. I see two ways:

a) use it bytea_agg as it now
b) use a string_agg with delimiter, that will be usually empty.

Using a string_agg for bytea is not too intuitive (but has sense) -
maybe we can introduce a synonym type for bytea - like "binary string"
or "bstring".

Regards

Pavel

>
> So I think we should try to make this as much like the text case as
> possible.
>
>                        regards, tom lane

Re: patch: bytea_agg

From

"Kevin Grittner"

Date:

23 December 2011, 17:49:16

Pavel Stehule <pavel.stehule@gmail.com> wrote:
> maybe we can introduce a synonym type for bytea - like "binary
> string" or "bstring".
The standard mentions these names for binary strings:
BINARY, BINARY VARYING, or BINARY LARGE OBJECT
which have a certain symmetry with:
CHARACTER, CHARACTER VARYING, and CHARACTER
LARGE OBJECT
-Kevin

Re: patch: bytea_agg

From

Alvaro Herrera

Date:

23 December 2011, 21:44:42

Excerpts from Pavel Stehule's message of vie dic 23 18:36:11 -0300 2011:
> Hello
>
> 2011/12/23 Tom Lane <tgl@sss.pgh.pa.us>:

> > I generally agree with Peter: string_agg makes sense here.  The only
> > real argument against it is Pavel's point that he didn't include a
> > delimiter parameter, but that just begs the question why not.  It
> > seems at least plausible that there would be use-cases for it.
>
> I don't know a real usage for bytea delimiter. Probably there is, but
> I expect so most often use case will be without delimiter.

Well, sometimes bytea is used to store character strings when the
encoding information is to be handled by the app instead of having
Postgres know it, for various reasons.  I haven't seen any of those
cases yet that would use string_add(bytea) but I wouldn't foreclose that
possibility.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: patch: bytea_agg

From

Merlin Moncure

Date:

24 December 2011, 09:13:49

On Fri, Dec 23, 2011 at 12:30 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Well, because it doesn't operate on strings.
>
> I argued when we added string_agg that it ought to be called
> concat_agg, or something like that, but I got shouted down.  So now
> here we are.

+1.  Using the input type names to name the function is a mistake and
should be stopped...enough.  It's verbose and unnecessary...everything
else in sql is heavily overloaded (we don't have int_max() and
float_max()).

merlin

Re: patch: bytea_agg

From

Peter Eisentraut

Date:

04 April 2012, 17:20:38

On fre, 2011-12-23 at 19:51 +0200, Peter Eisentraut wrote:
> On ons, 2011-12-21 at 11:04 +0100, Pavel Stehule wrote:
> > this patch adds a bytea_agg aggregation.
> >
> > It allow fast bytea concatetation.
>
> Why not call it string_agg?  All the function names are the same between
> text and bytea (e.g., ||, substr, position, length).  It would be nice
> not to introduce arbitrary differences.

Here is a patch to do the renaming.  As it stands, it fails the
opr_sanity regression test, because that complains that there are now
two aggregate functions string_agg with different number of arguments.
It seems to me that that test should really only complain if the common
argument types of the two aggregates are the same, correct?

Attachment

bytea-string-agg.patch

Re: patch: bytea_agg

From

Tom Lane

Date:

04 April 2012, 19:59:55

Peter Eisentraut <peter_e@gmx.net> writes:
> On fre, 2011-12-23 at 19:51 +0200, Peter Eisentraut wrote:
>> On ons, 2011-12-21 at 11:04 +0100, Pavel Stehule wrote:
>>> this patch adds a bytea_agg aggregation.

>> Why not call it string_agg?

> Here is a patch to do the renaming.  As it stands, it fails the
> opr_sanity regression test, because that complains that there are now
> two aggregate functions string_agg with different number of arguments.
> It seems to me that that test should really only complain if the common
> argument types of the two aggregates are the same, correct?

Uh, no.  That test is there for good and sufficient reasons, as per its
comment:

-- Check that there are not aggregates with the same name and different
-- numbers of arguments.  While not technically wrong, we have a project policy
-- to avoid this because it opens the door for confusion in connection with
-- ORDER BY: novices frequently put the ORDER BY in the wrong place.
-- See the fate of the single-argument form of string_agg() for history.

The renaming you propose would only be acceptable to those who have
forgotten that history.  I haven't.
        regards, tom lane

Re: patch: bytea_agg

From

Greg Stark

Date:

04 April 2012, 20:23:56

On Wed, Apr 4, 2012 at 11:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The renaming you propose would only be acceptable to those who have
> forgotten that history.  I haven't.

I had. I looked it up
http://archives.postgresql.org/pgsql-bugs/2010-08/msg00044.php

That was quite a thread.

--
greg

Re: patch: bytea_agg

From

Peter Eisentraut

Date:

07 April 2012, 07:52:21

On ons, 2012-04-04 at 18:59 -0400, Tom Lane wrote:
> 
> >> Why not call it string_agg?
> 
> > Here is a patch to do the renaming.  As it stands, it fails the
> > opr_sanity regression test, because that complains that there are now
> > two aggregate functions string_agg with different number of arguments.
> > It seems to me that that test should really only complain if the common
> > argument types of the two aggregates are the same, correct?
> 
> Uh, no.  That test is there for good and sufficient reasons, as per its
> comment:
> 
> -- Check that there are not aggregates with the same name and different
> -- numbers of arguments.  While not technically wrong, we have a project policy
> -- to avoid this because it opens the door for confusion in connection with
> -- ORDER BY: novices frequently put the ORDER BY in the wrong place.
> -- See the fate of the single-argument form of string_agg() for history.
> 
> The renaming you propose would only be acceptable to those who have
> forgotten that history.  I haven't.

I had reviewed that thread very carefully, but I'm not sure it applies.
The issue was that we don't want aggregates with optional second
argument, because "novices" could confuse

agg(a, b ORDER BY c)

with

agg(a ORDER BY b, c)  -- wrong

without the error being flagged.

But that doesn't apply if the first argument has different types.
Erroneously calling agg(textdatum ORDER BY textdatum, sthelse) will not
result in a call to agg(bytea).  (Unless the textdatum is really an
unknown literal, but that would be silly.)

Nevertheless, the problem would now be that adding string_agg(bytea)
would effectively forbid adding string_agg(bytea, delim) in the future.
So making a two-argument string_agg(bytea, bytea) now seems like the
best solution anyway.  (This applies independently of the function
renaming, actually.)

Re: patch: bytea_agg

From

Tom Lane

Date:

07 April 2012, 11:38:30

Peter Eisentraut <peter_e@gmx.net> writes:
> On ons, 2012-04-04 at 18:59 -0400, Tom Lane wrote:
>> Uh, no.  That test is there for good and sufficient reasons, as per its
>> comment:

> I had reviewed that thread very carefully, but I'm not sure it applies.
> The issue was that we don't want aggregates with optional second
> argument, because "novices" could confuse

> agg(a, b ORDER BY c)

> with

> agg(a ORDER BY b, c)  -- wrong

> without the error being flagged.

> But that doesn't apply if the first argument has different types.
> Erroneously calling agg(textdatum ORDER BY textdatum, sthelse) will not
> result in a call to agg(bytea).

The point of the policy is to be sure that we will throw a specific
error message with a suitable hint when a malformed call is made.
If there are alternative aggregates in the catalogs that have the
potential to "capture" such a call, then we risk the wrong thing
happening.  It appears that the lack of any implicit cast from bytea to
text would prevent that today, but I still think this proposal boxes
us in to an unreasonable degree compared to the benefit.  For example,
this would greatly restrict our ability to throw "ambiguous aggregate"
messages.

> Nevertheless, the problem would now be that adding string_agg(bytea)
> would effectively forbid adding string_agg(bytea, delim) in the future.
> So making a two-argument string_agg(bytea, bytea) now seems like the
> best solution anyway.  (This applies independently of the function
> renaming, actually.)

Hm.  So are you now suggesting we should get rid of one-argument
bytea_agg and replace it with two-argument string_agg(bytea,bytea)?
I could support that, since we've not released bytea_agg yet.
        regards, tom lane

Re: patch: bytea_agg

From

Peter Eisentraut

Date:

11 April 2012, 14:47:13

On lör, 2012-04-07 at 10:38 -0400, Tom Lane wrote:
> > Nevertheless, the problem would now be that adding string_agg(bytea)
> > would effectively forbid adding string_agg(bytea, delim) in the
> future.
> > So making a two-argument string_agg(bytea, bytea) now seems like the
> > best solution anyway.  (This applies independently of the function
> > renaming, actually.)
>
> Hm.  So are you now suggesting we should get rid of one-argument
> bytea_agg and replace it with two-argument string_agg(bytea,bytea)?
> I could support that, since we've not released bytea_agg yet.

Yes, that looks like the best solution.  Here is a patch for that.

Attachment

bytea-string-agg.patch

Re: patch: bytea_agg

From

Tom Lane

Date:

11 April 2012, 14:57:32

Peter Eisentraut <peter_e@gmx.net> writes:
> On lör, 2012-04-07 at 10:38 -0400, Tom Lane wrote:
>> Hm.  So are you now suggesting we should get rid of one-argument
>> bytea_agg and replace it with two-argument string_agg(bytea,bytea)?
>> I could support that, since we've not released bytea_agg yet.

> Yes, that looks like the best solution.  Here is a patch for that.

Looks sane in a quick once-over, except for the documentation entry.
I'm not really thrilled with "text, text or bytea, bytea" because it
seems easy to misparse.  Moreover, as written the entry claims that
the return type is text either way, which is wrong.  You could fix
the latter by writing "same as argument data type", but I wonder
whether it'd be better to make separate table entries for the two
forms.
        regards, tom lane