Thread: Review: listagg aggregate

Review: listagg aggregate

From

David E. Wheeler

Date:

22 January 2010, 15:14:28

Pavel,

My review of your listagg patch.

Submission Review
-----------------
* The diff is a context diff and applies cleanly to HEAD (with just two hunks offset by 2 lines each).

* There is documentation, though I'm not sure it needs to be mentioned in the string functions documentation. No harm
init, I guess. 

  I would like to see an example, though, and the documentation does not currently explain what each of the parameters
arefor. In fact, it looks like all the existing aggregates take only one parameter, so there was not previously a need
toexplain it. But listagg() has an optional second param. I think that the description should explain what it's for. 

* There are tests and they look fine.

Usability Review
----------------
* The patch does in fact implement the aggregate function it describes, and OH YES do we want it (I've written my own
inSQL a few times). 

* No, we don't already have it.

* Yes it follows community-agreed behavior. I'm assuming that there is no special parsing of aggregate functions, so
thesimple use of commas to separate the two parameters is appropriate, rather than using a keyword like MySQL's
SEPARATORin the group_concat() aggregate. 

* No need to have pg_dump support, no dangers that I can see, looks like all the bases have been covered.

Feature Test
------------
* Everything built cleanly, but I got an OID dupe error when I tried to init the DB. Looks like 2997 and 2998 have been
usedfor something else since you created the patch. I changed them to 2995 and 2996 and then it worked. 
* The feature appears to work. I didn't see any tests for encodings or other data types, so I ran a few myself and they
workfine: 

postgres=# select listagg(a, U&'-\0441\043B\043E\043D-') from (values('aaaa'),('bbbb'),('cccc'
         listagg
--------------------------
 aaaa-слон-bbbb-слон-cccc
(1 row)

postgres=# select listagg(a, U&'\2014') from
(values(U&'\0441\043B\043E\043D'),(U&'d\0061t\+000061'),(U&'\0441\043B\043E\043D'))AS g(a); 
    listagg
----------------
 слон—data—слон
(1 row)


postgres=# select listagg(a::text) from (values(1),(2),(3)) AS g(a);
 listagg
---------
 123
(1 row)


Performance Review
------------------

No performance issues, except that it should be faster than a custom aggregate that does the same thing. To test, I
createda quick custom aggregate (no second argument, alas, so listagg() is more flexible) like so: 

    CREATE OR REPLACE FUNCTION a2s(ANYARRAY)
    RETURNS TEXT LANGUAGE SQL AS $$
        SELECT array_to_string($1, ',');
    $$;

   CREATE AGGREGATE string_accum (
        SFUNC    = array_accum,
        BASETYPE = ANYELEMENT,
        STYPE    = ANYARRAY,
        INITCOND = '{}',
        FINALFUNC = a2s
    );

Then I ran some simple tests (thanks for the clue, depesz):

postgres=# select count(*) from (select string_accum(a) from (values('aaaa'),('bbbb'),('cccc')) AS g(a),
generate_series(1,10000)i) AS x(i); 
 count
-------
     1
(1 row)

Time: 1365.382 ms

postgres=# select count(*) from (select listagg(a) from (values('aaaa'),('bbbb'),('cccc')) AS g(a),
generate_series(1,10000)i) AS x(i); 
 count
-------
     1
(1 row)

Time: 17.989 ms

So overall, it looks like listagg() is 1-2 orders of magnitude faster. YMMV, and my system is built with
--enable-cassertand --enable-debug. Still, good job. 

Coding Review
-------------

* Is varchar.c really the best place to put the ListAggState struct and the listagg() function? I grepped the source
forarray_agg() and it's in src/backend/utils/adt/array_userfuncs.c. Maybe there's an equivalent file for string
functions?Otherwise, the style of the C code looks fine to my untrained eye. 

  Actually, shouldn't it return text rather than varchar?

* Does it really require four functions to do its work? Might there be some way to use the array_agg() C functions and
thenjust a different final function to turn it into a string (using the internal array_to_string() function, perhaps)?
I'mnot at all sure about it, but given how little code was required to create the same basic functionality in SQL, I'm
surprisedthat the C implementation requires four functions (accumStringResult(), listagg1_transfn(),
listagg2_transfn(),and listagg_finalfn()). Maybe they're required to make it fast and avoid the overhead of an array? 

* No compiler warnings, I never made it crash, good comments, does what it says on the tin. I doubt that there are any
portabilityissues, as the code seems to use standard PostgreSQL internal macros and functions. 

Architecture Review
-------------------

* No dependencies, things seem to make sense overall, notwithstanding my questions in the Coding Review.

Review Review
-------------

The only thing I haven't covered so far is the name. I agree with Tom's assertion that the name is awful. Sure there
maybe a precedent in Oracle, but I hardly find that convincing (some of the big corporations seem to do a really shitty
jobnaming things in their APIs). Given that we already have array_agg(), what about simply concat_agg(), as Tom
(wincingly,I grant you) suggested? Or string_agg()? 

My bike shed is puce.

Attached is a new patch with the changed OIDs and an added phrase in the documentation that indicates that the second
argumentcan be used to separate the concatenated values. 

Best,

David

Attachment

listagg.patch

Re: Review: listagg aggregate

From

Pavel Stehule

Date:

24 January 2010, 05:19:54

Hello

2010/1/22 David E. Wheeler <david@kineticode.com>:
> Pavel,
>
> My review of your listagg patch.
>
> Submission Review
> -----------------
> * The diff is a context diff and applies cleanly to HEAD (with just two hunks offset by 2 lines each).
>
> * There is documentation, though I'm not sure it needs to be mentioned in the string functions documentation. No harm
init, I guess. 
>
>  I would like to see an example, though, and the documentation does not currently explain what each of the parameters
arefor. In fact, it looks like all the existing aggregates take only one parameter, so there was not previously a need
toexplain it. But listagg() has an optional second param. I think that the description should explain what it's for. 
>

can I help with it, please. My English is terrible.

> * There are tests and they look fine.
>
> Usability Review
> ----------------
> * The patch does in fact implement the aggregate function it describes, and OH YES do we want it (I've written my own
inSQL a few times). 
>
> * No, we don't already have it.
>
> * Yes it follows community-agreed behavior. I'm assuming that there is no special parsing of aggregate functions, so
thesimple use of commas to separate the two parameters is appropriate, rather than using a keyword like MySQL's
SEPARATORin the group_concat() aggregate. 
>
> * No need to have pg_dump support, no dangers that I can see, looks like all the bases have been covered.
>
> Feature Test
> ------------
> * Everything built cleanly, but I got an OID dupe error when I tried to init the DB. Looks like 2997 and 2998 have
beenused for something else since you created the patch. I changed them to 2995 and 2996 and then it worked. 
> * The feature appears to work. I didn't see any tests for encodings or other data types, so I ran a few myself and
theywork fine: 
>
> postgres=# select listagg(a, U&'-\0441\043B\043E\043D-') from (values('aaaa'),('bbbb'),('cccc'
>         listagg
> --------------------------
>  aaaa-слон-bbbb-слон-cccc
> (1 row)
>
> postgres=# select listagg(a, U&'\2014') from
(values(U&'\0441\043B\043E\043D'),(U&'d\0061t\+000061'),(U&'\0441\043B\043E\043D'))AS g(a); 
>    listagg
> ----------------
>  слон—data—слон
> (1 row)
>
>
> postgres=# select listagg(a::text) from (values(1),(2),(3)) AS g(a);
>  listagg
> ---------
>  123
> (1 row)
>
>
> Performance Review
> ------------------
>
> No performance issues, except that it should be faster than a custom aggregate that does the same thing. To test, I
createda quick custom aggregate (no second argument, alas, so listagg() is more flexible) like so: 
>
>    CREATE OR REPLACE FUNCTION a2s(ANYARRAY)
>    RETURNS TEXT LANGUAGE SQL AS $$
>        SELECT array_to_string($1, ',');
>    $$;
>
>   CREATE AGGREGATE string_accum (
>        SFUNC    = array_accum,
>        BASETYPE = ANYELEMENT,
>        STYPE    = ANYARRAY,
>        INITCOND = '{}',
>        FINALFUNC = a2s
>    );
>
> Then I ran some simple tests (thanks for the clue, depesz):
>
> postgres=# select count(*) from (select string_accum(a) from (values('aaaa'),('bbbb'),('cccc')) AS g(a),
generate_series(1,10000)i) AS x(i); 
>  count
> -------
>     1
> (1 row)
>
> Time: 1365.382 ms
>
> postgres=# select count(*) from (select listagg(a) from (values('aaaa'),('bbbb'),('cccc')) AS g(a),
generate_series(1,10000)i) AS x(i); 
>  count
> -------
>     1
> (1 row)
>
> Time: 17.989 ms
>
> So overall, it looks like listagg() is 1-2 orders of magnitude faster. YMMV, and my system is built with
--enable-cassertand --enable-debug. Still, good job. 
>
> Coding Review
> -------------
>
> * Is varchar.c really the best place to put the ListAggState struct and the listagg() function? I grepped the source
forarray_agg() and it's in src/backend/utils/adt/array_userfuncs.c. Maybe there's an equivalent file for string
functions?Otherwise, the style of the C code looks fine to my untrained eye. 
>

array user functions are used more time in pg core. The complexity of
array functions are much higher, so I don't think we need special
file.

>  Actually, shouldn't it return text rather than varchar?
>

I'll recheck it. I am sure so all parameters should be a text.

> * Does it really require four functions to do its work? Might there be some way to use the array_agg() C functions
andthen just a different final function to turn it into a string (using the internal array_to_string() function,
perhaps)?

We can, but it isn't good way. Processing of arrays is little bit more
expensive then processing plain text. It is reason why listagg is
faster, than your custom aggregate. More, the final function could be
faster - the content is final.
I'm not at all sure about it, but given how little code was required
to create the same basic functionality in SQL, I'm surprised that the
C implementation requires four functions (accumStringResult(),
listagg1_transfn(), listagg2_transfn(), and listagg_finalfn()). Maybe
they're required to make it fast and avoid the overhead of an array?

It normal for aggregate functions. We need more transfn function,
because we need two two variant: listagg(col), listagg(col, sep). Our
coding guidlines doesn't advice share C functions - but these
functions are +/- wrapper for accumStringResult - so there is zero
overhead.

>
> * No compiler warnings, I never made it crash, good comments, does what it says on the tin. I doubt that there are
anyportability issues, as the code seems to use standard PostgreSQL internal macros and functions. 
>
> Architecture Review
> -------------------
>
> * No dependencies, things seem to make sense overall, notwithstanding my questions in the Coding Review.
>
> Review Review
> -------------
>
> The only thing I haven't covered so far is the name. I agree with Tom's assertion that the name is awful. Sure there
maybe a precedent in Oracle, but I hardly find that convincing (some of the big corporations seem to do a really shitty
jobnaming things in their APIs). Given that we already have array_agg(), what about simply concat_agg(), as Tom
(wincingly,I grant you) suggested? Or string_agg()? 

I don't think. When we have function, with same parameters, same
behave like some Oracle function, then I am strongly prefer Oracle
name. I don't see any benefit from different name. It can only confuse
developers and add the trable to people who porting applications.


>
> My bike shed is puce.
>
> Attached is a new patch with the changed OIDs and an added phrase in the documentation that indicates that the second
argumentcan be used to separate the concatenated values. 
>
> Best,
>

Thank you very much
Regards
Pavel

> David
>
>
>
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

Re: Review: listagg aggregate

From

"David E. Wheeler"

Date:

24 January 2010, 12:57:52

On Jan 24, 2010, at 1:19 AM, Pavel Stehule wrote:

> can I help with it, please. My English is terrible.

Yes, I added a bit in the patch I submitted.

> array user functions are used more time in pg core. The complexity of
> array functions are much higher, so I don't think we need special
> file.

Okay. Should have tried it in PL/pgSQL then, perhaps.

> I'll recheck it. I am sure so all parameters should be a text.

Probably shouldn't go into varchar.c then, yes?

> We can, but it isn't good way. Processing of arrays is little bit more
> expensive then processing plain text. It is reason why listagg is
> faster, than your custom aggregate. More, the final function could be
> faster - the content is final.

Understood.

> It normal for aggregate functions. We need more transfn function,
> because we need two two variant: listagg(col), listagg(col, sep). Our
> coding guidlines doesn't advice share C functions - but these
> functions are +/- wrapper for accumStringResult - so there is zero
> overhead.

Ah, okay, it's because of the second argument. Now I understand.

> I don't think. When we have function, with same parameters, same
> behave like some Oracle function, then I am strongly prefer Oracle
> name. I don't see any benefit from different name. It can only confuse
> developers and add the trable to people who porting applications.

Meh. If the name is terrible, we don't have to use it, and it's easy enough to create an alias in SQL for those who
needit. 

Best,

David

Re: Review: listagg aggregate

From

Simon Riggs

Date:

24 January 2010, 14:03:06

On Fri, 2010-01-22 at 11:14 -0800, David E.Wheeler wrote:
> No performance issues

ISTM that this class of function is inherently dangerous performance
wise. 

* It looks incredibly easy to construct enormous lists. We should test
the explosion limit of this to see how it is handled. Perhaps we need
some parameter limits to control that, depending upon results.

* Optimizer doesn't consider whether the result type of an aggregate get
bigger as the aggregate processes more rows. If we're adding this
function we should give some thought in that area also, or at least a
comment to note that it can and will cause the optimizer problems in
complex queries.

-- Simon Riggs           www.2ndQuadrant.com

Re: Review: listagg aggregate

From

Pavel Stehule

Date:

24 January 2010, 14:40:29

2010/1/24 Simon Riggs <simon@2ndquadrant.com>:
> On Fri, 2010-01-22 at 11:14 -0800, David E.Wheeler wrote:
>> No performance issues
>
> ISTM that this class of function is inherently dangerous performance
> wise.

there is potencial risk, but this risk isn't new. The almost all what
you say is true for array aggregates.

>
> * It looks incredibly easy to construct enormous lists. We should test
> the explosion limit of this to see how it is handled. Perhaps we need
> some parameter limits to control that, depending upon results.

There are no limit for generating large values - like bytea, xml, or
text. There are not limit for array_accum or array(query). So, I don't
think we need some special mechanism for listagg. If somebody will
generate too large a value, then he will get "out of memory"
exception.

>
> * Optimizer doesn't consider whether the result type of an aggregate get
> bigger as the aggregate processes more rows. If we're adding this
> function we should give some thought in that area also, or at least a
> comment to note that it can and will cause the optimizer problems in
> complex queries.
>

this is true, but this isn't some new. array_accum working well
without optimizer problems.

> --
>  Simon Riggs           www.2ndQuadrant.com
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: Review: listagg aggregate

From

Tom Lane

Date:

24 January 2010, 14:44:32

Simon Riggs <simon@2ndQuadrant.com> writes:
> On Fri, 2010-01-22 at 11:14 -0800, David E.Wheeler wrote:
>> No performance issues

> ISTM that this class of function is inherently dangerous performance
> wise. 

> * It looks incredibly easy to construct enormous lists. We should test
> the explosion limit of this to see how it is handled. Perhaps we need
> some parameter limits to control that, depending upon results.

> * Optimizer doesn't consider whether the result type of an aggregate get
> bigger as the aggregate processes more rows. If we're adding this
> function we should give some thought in that area also, or at least a
> comment to note that it can and will cause the optimizer problems in
> complex queries.

We have that problem already with array_agg(), and I don't recall many
complaints about it.  It might be worth worrying about at some point,
but I don't think it's reasonable to insist that it be fixed before
any more such aggregates are created.

I agree that testing-to-failure would be a good idea just to be sure it
fails cleanly.
        regards, tom lane

Re: Review: listagg aggregate

From

Pavel Stehule

Date:

24 January 2010, 14:45:12

2010/1/24 David E. Wheeler <david@kineticode.com>:
> On Jan 24, 2010, at 1:19 AM, Pavel Stehule wrote:
>
>> can I help with it, please. My English is terrible.
>
> Yes, I added a bit in the patch I submitted.
>
>> array user functions are used more time in pg core. The complexity of
>> array functions are much higher, so I don't think we need special
>> file.
>
> Okay. Should have tried it in PL/pgSQL then, perhaps.
>

:) - I'll look on it again.

>> I'll recheck it. I am sure so all parameters should be a text.
>
> Probably shouldn't go into varchar.c then, yes?
>
>> We can, but it isn't good way. Processing of arrays is little bit more
>> expensive then processing plain text. It is reason why listagg is
>> faster, than your custom aggregate. More, the final function could be
>> faster - the content is final.
>
> Understood.
>
>> It normal for aggregate functions. We need more transfn function,
>> because we need two two variant: listagg(col), listagg(col, sep). Our
>> coding guidlines doesn't advice share C functions - but these
>> functions are +/- wrapper for accumStringResult - so there is zero
>> overhead.
>
> Ah, okay, it's because of the second argument. Now I understand.
>
>> I don't think. When we have function, with same parameters, same
>> behave like some Oracle function, then I am strongly prefer Oracle
>> name. I don't see any benefit from different name. It can only confuse
>> developers and add the trable to people who porting applications.
>
> Meh. If the name is terrible, we don't have to use it, and it's easy enough to create an alias in SQL for those who
needit.
 
>

The "list" is common name for this content - it is usual in Microsoft
SQL Server, it is usual in Oracle. Maybe we can vote about the name

Regards
Pavel Stehule

> Best,
>
> David
>
>

Re: Review: listagg aggregate

From

Scott Bailey

Date:

25 January 2010, 01:29:52

>> I don't think. When we have function, with same parameters, same
>> behave like some Oracle function, then I am strongly prefer Oracle
>> name. I don't see any benefit from different name. It can only confuse
>> developers and add the trable to people who porting applications.
> 
> Meh. If the name is terrible, we don't have to use it, and it's easy enough to create an alias in SQL for those who
needit.
 

The corresponding function in Oracle is called wm_concat. In MySQL its 
called group_concat. I don't believe DB2 or SQL Server have built in 
equivalents. The Oracle name isn't really an option ("wm' stands for 
workspace manager)

I think listagg or string_agg would be the most appropriate names. Oh 
and before Oracle had wm_concat, Tom Kyte wrote a function called stragg 
that was pretty popular.

Scott

Re: Review: listagg aggregate

From

Peter Eisentraut

Date:

25 January 2010, 06:16:28

On sön, 2010-01-24 at 21:29 -0800, Scott Bailey wrote:
> I think listagg or string_agg would be the most appropriate names. Oh 
> and before Oracle had wm_concat, Tom Kyte wrote a function called
> stragg that was pretty popular.

Well,

xmlagg -> concatenates values to form xml datum
array_agg -> concatenates values to form array datum
??? -> concatenates values to form string datum

So it's pretty clear that listagg does not fit into this scheme.

Re: Review: listagg aggregate

From

Pavel Stehule

Date:

25 January 2010, 06:35:12

2010/1/25 Peter Eisentraut <peter_e@gmx.net>:
> On sön, 2010-01-24 at 21:29 -0800, Scott Bailey wrote:
>> I think listagg or string_agg would be the most appropriate names. Oh
>> and before Oracle had wm_concat, Tom Kyte wrote a function called
>> stragg that was pretty popular.
>
> Well,
>
> xmlagg -> concatenates values to form xml datum
> array_agg -> concatenates values to form array datum
> ??? -> concatenates values to form string datum
>
> So it's pretty clear that listagg does not fit into this scheme.

when you define list as text domain, then this the name is correct.

searching "list sql string" on google. You can see, so "list" is usually used.

Regards
Pavel

>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: Review: listagg aggregate

From

Tom Lane

Date:

25 January 2010, 09:57:26

Pavel Stehule <pavel.stehule@gmail.com> writes:
> 2010/1/25 Peter Eisentraut <peter_e@gmx.net>:
>> xmlagg -> concatenates values to form xml datum
>> array_agg -> concatenates values to form array datum
>> ??? -> concatenates values to form string datum
>> 
>> So it's pretty clear that listagg does not fit into this scheme.

> when you define list as text domain, then this the name is correct.

IOW, if you define away the problem then there's no problem?

I agree that "list" is a terrible choice of name here.  "string_agg"
seemed reasonable and in keeping with the standardized "array_agg".
        regards, tom lane

Re: Review: listagg aggregate

From

Pavel Stehule

Date:

25 January 2010, 10:12:50

2010/1/25 Tom Lane <tgl@sss.pgh.pa.us>:
> Pavel Stehule <pavel.stehule@gmail.com> writes:
>> 2010/1/25 Peter Eisentraut <peter_e@gmx.net>:
>>> xmlagg -> concatenates values to form xml datum
>>> array_agg -> concatenates values to form array datum
>>> ??? -> concatenates values to form string datum
>>>
>>> So it's pretty clear that listagg does not fit into this scheme.
>
>> when you define list as text domain, then this the name is correct.
>
> IOW, if you define away the problem then there's no problem?
>
> I agree that "list" is a terrible choice of name here.  "string_agg"
> seemed reasonable and in keeping with the standardized "array_agg".
>

I am not happy, I thing so we do bigger chaos then it is. But it
hasn't progress. So I agree with name string_agg. In this case isn't a
problem rename this function if somebody would.

I'll send patch over hour.

regards
Pavel Stehule

>                        regards, tom lane
>

Re: Review: listagg aggregate

From

Pavel Stehule

Date:

25 January 2010, 10:56:20

2010/1/25 Tom Lane <tgl@sss.pgh.pa.us>:
> Pavel Stehule <pavel.stehule@gmail.com> writes:
>> 2010/1/25 Peter Eisentraut <peter_e@gmx.net>:
>>> xmlagg -> concatenates values to form xml datum
>>> array_agg -> concatenates values to form array datum
>>> ??? -> concatenates values to form string datum
>>>
>>> So it's pretty clear that listagg does not fit into this scheme.
>
>> when you define list as text domain, then this the name is correct.
>
> IOW, if you define away the problem then there's no problem?
>
> I agree that "list" is a terrible choice of name here.  "string_agg"
> seemed reasonable and in keeping with the standardized "array_agg".

actualised patch - the name is string_agg

regards
Pavel Stehule

>
>                        regards, tom lane
>

Attachment

string_agg.diff

Re: Review: listagg aggregate

From

Pavel Stehule

Date:

25 January 2010, 11:05:13

2010/1/24 Tom Lane <tgl@sss.pgh.pa.us>:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On Fri, 2010-01-22 at 11:14 -0800, David E.Wheeler wrote:
>>> No performance issues
>
>> ISTM that this class of function is inherently dangerous performance
>> wise.
>
>> * It looks incredibly easy to construct enormous lists. We should test
>> the explosion limit of this to see how it is handled. Perhaps we need
>> some parameter limits to control that, depending upon results.
>
>> * Optimizer doesn't consider whether the result type of an aggregate get
>> bigger as the aggregate processes more rows. If we're adding this
>> function we should give some thought in that area also, or at least a
>> comment to note that it can and will cause the optimizer problems in
>> complex queries.
>
> We have that problem already with array_agg(), and I don't recall many
> complaints about it.  It might be worth worrying about at some point,
> but I don't think it's reasonable to insist that it be fixed before
> any more such aggregates are created.
>
> I agree that testing-to-failure would be a good idea just to be sure it
> fails cleanly.

postgres=# \timing
Timing is on.
postgres=# select
pg_size_pretty(length(string_agg('012345678901234567890'::text,',')))
from generate_series(1,10000000) g(i);pg_size_pretty
----------------210 MB
(1 row)

Time: 5831,218 ms
postgres=# select
pg_size_pretty(length(string_agg('012345678901234567890'::text,',')))
from generate_series(1,50000000) g(i);
^[^[ERROR:  out of memory
DETAIL:  Cannot enlarge string buffer containing 1073741812 bytes by
21 more bytes.
postgres=#

I thing, so 210 MB is more then is necessary :)

Regards
Pavel Stehule


>
>                        regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: Review: listagg aggregate

From

"David E. Wheeler"

Date:

25 January 2010, 16:10:00

On Jan 25, 2010, at 2:09 AM, Peter Eisentraut wrote:

> xmlagg -> concatenates values to form xml datum
> array_agg -> concatenates values to form array datum
> ??? -> concatenates values to form string datum

concat_agg().

David

Re: Review: listagg aggregate

From

"David E. Wheeler"

Date:

25 January 2010, 16:10:11

On Jan 25, 2010, at 6:12 AM, Pavel Stehule wrote:

> I am not happy, I thing so we do bigger chaos then it is. But it
> hasn't progress. So I agree with name string_agg. In this case isn't a
> problem rename this function if somebody would.

Could you not use CREATE AGGREGATE to create an alias named listagg if you needed it? Or are we limited to a single
argumentin that case? 

David

Re: Review: listagg aggregate

From

Robert Haas

Date:

25 January 2010, 16:20:39

On Mon, Jan 25, 2010 at 2:27 PM, David E. Wheeler <david@kineticode.com> wrote:
> On Jan 25, 2010, at 2:09 AM, Peter Eisentraut wrote:
>> xmlagg -> concatenates values to form xml datum
>> array_agg -> concatenates values to form array datum
>> ??? -> concatenates values to form string datum
>
> concat_agg().

I like that one...

...Robert

Re: Review: listagg aggregate

From

Pavel Stehule

Date:

26 January 2010, 02:59:49

2010/1/25 David E. Wheeler <david@kineticode.com>:
> On Jan 25, 2010, at 6:12 AM, Pavel Stehule wrote:
>
>> I am not happy, I thing so we do bigger chaos then it is. But it
>> hasn't progress. So I agree with name string_agg. In this case isn't a
>> problem rename this function if somebody would.
>
> Could you not use CREATE AGGREGATE to create an alias named listagg if you needed it? Or are we limited to a single
argumentin that case?
 

we can, but without one common known name we will have a propriety name.

Pavel

>
> David
>
>
>
>

Re: Review: listagg aggregate

From

Pavel Stehule

Date:

26 January 2010, 03:14:32

2010/1/25 Robert Haas <robertmhaas@gmail.com>:
> On Mon, Jan 25, 2010 at 2:27 PM, David E. Wheeler <david@kineticode.com> wrote:
>> On Jan 25, 2010, at 2:09 AM, Peter Eisentraut wrote:
>>> xmlagg -> concatenates values to form xml datum
>>> array_agg -> concatenates values to form array datum
>>> ??? -> concatenates values to form string datum
>>
>> concat_agg().
>
> I like that one...
>

why is concat_agg better than listagg ?

Pavel

> ...Robert
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: Review: listagg aggregate

From

"David E. Wheeler"

Date:

26 January 2010, 07:08:33

On Jan 25, 2010, at 23:14, Pavel Stehule <pavel.stehule@gmail.com>  
wrote:

> why is concat_agg better than listagg ?

Because it's an aggregate that cocatenates values. It's not an  
aggregate that lists things. I also like concat_agg better than  
string_agg because it's not limited to acting on strings.

Best,

David

Re: Review: listagg aggregate

From

Alastair Turner

Date:

26 January 2010, 07:24:13

On Tue, Jan 26, 2010 at 1:08 PM, David E. Wheeler <david@kineticode.com> wrote:

.....

>
> Because it's an aggregate that cocatenates values. It's not an aggregate
> that lists things. I also like concat_agg better than string_agg because
> it's not limited to acting on strings.
>

.....

Given that it potentially produces a delimited list, not a straight
conacatenation (and that list is unacceptable since it would be
descriptive as a noun but not as a verb) would implode_agg not be the
most descriptive name?

Re: Review: listagg aggregate

From

Alastair Turner

Date:

26 January 2010, 07:44:48

On Tue, Jan 26, 2010 at 1:23 PM, Alastair Turner <bell@ctrlf5.co.za> wrote:
> .....
>
> Given that it potentially produces a delimited list, not a straight
> conacatenation (and that list is unacceptable since it would be
> descriptive as a noun but not as a verb) would implode_agg not be the
> most descriptive name?
>

Actually, scratch that. The other *agg functions are named for what
they produce as output. Not for their process - as per the objection
to list_agg and suggestions of conact_agg and implode_agg. string_agg
would be consistent, which is a wonderful thing if you can get it in a
naming scheme.

Re: Review: listagg aggregate

From

Peter Eisentraut

Date:

26 January 2010, 08:03:42

On tis, 2010-01-26 at 03:08 -0800, David E. Wheeler wrote:
> On Jan 25, 2010, at 23:14, Pavel Stehule <pavel.stehule@gmail.com>  
> wrote:
> 
> > why is concat_agg better than listagg ?
> 
> Because it's an aggregate that cocatenates values. It's not an  
> aggregate that lists things. I also like concat_agg better than  
> string_agg because it's not limited to acting on strings.

What else can it act on?

Re: Review: listagg aggregate

From

Robert Haas

Date:

26 January 2010, 10:15:48

On Tue, Jan 26, 2010 at 2:14 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2010/1/25 Robert Haas <robertmhaas@gmail.com>:
>> On Mon, Jan 25, 2010 at 2:27 PM, David E. Wheeler <david@kineticode.com> wrote:
>>> On Jan 25, 2010, at 2:09 AM, Peter Eisentraut wrote:
>>>> xmlagg -> concatenates values to form xml datum
>>>> array_agg -> concatenates values to form array datum
>>>> ??? -> concatenates values to form string datum
>>>
>>> concat_agg().
>>
>> I like that one...
>
> why is concat_agg better than listagg ?

Because it doesn't make lists.

Honestly, I don't love concat_agg() either - why should something need
to have agg in the name just because it's an aggregate?  I think the
most descriptive name would be something like
concatenate_with_separator(), but that's kind of long.

...Robert

Re: Review: listagg aggregate

From

Pavel Stehule

Date:

26 January 2010, 10:56:21

2010/1/26 Robert Haas <robertmhaas@gmail.com>:
> On Tue, Jan 26, 2010 at 2:14 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> 2010/1/25 Robert Haas <robertmhaas@gmail.com>:
>>> On Mon, Jan 25, 2010 at 2:27 PM, David E. Wheeler <david@kineticode.com> wrote:
>>>> On Jan 25, 2010, at 2:09 AM, Peter Eisentraut wrote:
>>>>> xmlagg -> concatenates values to form xml datum
>>>>> array_agg -> concatenates values to form array datum
>>>>> ??? -> concatenates values to form string datum
>>>>
>>>> concat_agg().
>>>
>>> I like that one...
>>
>> why is concat_agg better than listagg ?
>
> Because it doesn't make lists.
>
> Honestly, I don't love concat_agg() either - why should something need
> to have agg in the name just because it's an aggregate?  I think the
> most descriptive name would be something like
> concatenate_with_separator(), but that's kind of long.

This is never ending story :)

MySQL has function concate_ws - but this function has different semantic.

I thing so string_agg is short, and from one view consistent

Pavel

>
> ...Robert
>

Re: Review: listagg aggregate

From

"David E. Wheeler"

Date:

26 January 2010, 11:24:26

On Jan 26, 2010, at 4:03, Peter Eisentraut <peter_e@gmx.net> wrote:

> What else can it act on?

Any data type, since they all can be converted to text. Integers would  
be a common choice.

David

Re: Review: listagg aggregate

From

Tom Lane

Date:

26 January 2010, 12:04:13

Pavel Stehule <pavel.stehule@gmail.com> writes:
> 2010/1/25 Robert Haas <robertmhaas@gmail.com>:
>> On Mon, Jan 25, 2010 at 2:27 PM, David E. Wheeler <david@kineticode.com> wrote:
>>> concat_agg().
>> 
>> I like that one...

> why is concat_agg better than listagg ?

It isn't ... it's the wrong part of speech.  "concat"enate is a verb,
whereas the other functions we would like it to be named parallel to
are using nouns there.

(Yes, I know "array" can be used as a verb, but I don't think anyone
reads it that way in "array_agg"...)
        regards, tom lane

Re: Review: listagg aggregate

From

"Kevin Grittner"

Date:

26 January 2010, 12:16:00

Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Pavel Stehule <pavel.stehule@gmail.com> writes:
>> why is concat_agg better than listagg ?
> 
> It isn't ... it's the wrong part of speech.  "concat"enate is a
> verb,
Concatenation is a noun.  "concat" doesn't get far enough to
distinguish.
-Kevin

testing cvs HEAD - HS/SR - missing file

From

"Erik Rijkers"

Date:

26 January 2010, 13:19:30

I don't know if it has anything to do with HS/SR, I haven't tried it on a single CVS vanilla
installation (yet).

To test with HS/SR, I've setup three 9.0devel instances (cvs as of today) on a single machine, one
as a primary, and two as slaves.  I used the instructions in
 http://wiki.postgresql.org/wiki/Streaming_Replication

and

./configure --prefix=/var/data1/pg_stuff/pg_installations/pgsql.sr_primary \ --with-pgport=6565 --quiet --enable-depend
--with-openssl--with-perl \ --with-libxml --with-libxslt
 


These seem to work/replicate well.


Now, when restoring a 700MB dump (made with a 9.0devel pg_dump + pg_restore) into the primary,
errors like the following occur, on all three instances:

FATAL:  could not open file "pg_xlog/0000000100000001000000FF" (log file 1, segment 255): No such
file or directory

So, there are three logfiles: pgsql.sr_primary/logfile pgsql.sr_slavery/logfile pgsql.sr_slave02/logfile

$ grep -E 'FATAL|ERROR' pgsql.sr_*/logfile
pgsql.sr_primary/logfile:ERROR:  canceling autovacuum task
pgsql.sr_primary/logfile:ERROR:  canceling autovacuum task
pgsql.sr_primary/logfile:FATAL:  could not open file "pg_xlog/0000000100000001000000FF" (log file
1, segment 255): No such file or directory
pgsql.sr_primary/logfile:FATAL:  could not open file "pg_xlog/0000000100000001000000FF" (log file
1, segment 255): No such file or directory
pgsql.sr_primary/logfile:ERROR:  canceling autovacuum task
pgsql.sr_primary/logfile:ERROR:  canceling autovacuum task
pgsql.sr_slave02/logfile:ERROR:  could not read xlog records: FATAL:  could not open file
"pg_xlog/0000000100000001000000FF" (log file 1, segment 255): No such file or directory
pgsql.sr_slavery/logfile:ERROR:  could not read xlog records: FATAL:  could not open file
"pg_xlog/0000000100000001000000FF" (log file 1, segment 255): No such file or directory


This has happened several times, always 'segment' 255, 'log file'  1, 3 or 4.


hth,

Erik Rijkers

Re: Review: listagg aggregate

From

Tom Lane

Date:

26 January 2010, 13:36:41

"David E. Wheeler" <david@kineticode.com> writes:
> Because it's an aggregate that cocatenates values. It's not an  
> aggregate that lists things. I also like concat_agg better than  
> string_agg because it's not limited to acting on strings.

But what it *produces* is a string.  For comparison, the
SQL-standard-specified array_agg produces arrays, but what it
acts on isn't an array.
        regards, tom lane

Re: testing cvs HEAD - HS/SR - missing file

From

"Kevin Grittner"

Date:

26 January 2010, 13:39:36

"Erik Rijkers" <er@xs4all.nl> wrote:
> FATAL:  could not open file "pg_xlog/0000000100000001000000FF"
> (log file 1, segment 255): No such file or directory
Yeah, log file segment numbers skip FF; they go from FE to 00 with
the "log file" number (the middle part) bumped by one.  Whatever is
coming up with that segment number is wrong.  (Unless this is
changing in 9.0?)
-Kevin

Re: Review: listagg aggregate

From

"David E. Wheeler"

Date:

26 January 2010, 14:04:36

On Jan 26, 2010, at 9:36 AM, Tom Lane wrote:

> But what it *produces* is a string.  For comparison, the
> SQL-standard-specified array_agg produces arrays, but what it
> acts on isn't an array.

Meh. This is all just bike-shedding. I'm fine with string_agg(), though in truth none of the names has really been
great.The inclusion of "agg" in the name is unfortunate. 

I'll have a look at Pavel's new patch now.

David

Re: Review: listagg aggregate

From

Tom Lane

Date:

26 January 2010, 14:09:29

"David E. Wheeler" <david@kineticode.com> writes:
> Meh. This is all just bike-shedding. I'm fine with string_agg(), though in truth none of the names has really been
great.The inclusion of "agg" in the name is unfortunate.
 

Yeah, I wouldn't be for it either if it weren't for the precedent of
array_agg.  I was quite surprised that the SQL committee chose that
name, because they've avoided using the term "aggregate function" at
all, but there it is ...
        regards, tom lane

Re: Review: listagg aggregate

From

"David E. Wheeler"

Date:

26 January 2010, 14:15:03

On Jan 25, 2010, at 6:56 AM, Pavel Stehule wrote:

> actualised patch - the name is string_agg

All looks fine except I'm getting this error during initdb:

creating template1 database in /usr/local/pgsql-devel/data/base/1 ... FATAL:  could not create unique index
"pg_proc_oid_index"
DETAIL:  Key (oid)=(3031) is duplicated.
child process exited with exit code 1

Would you mind re-submitting with unique OIDs?

Thanks,

David

Re: Review: listagg aggregate

From

Robert Haas

Date:

26 January 2010, 14:15:38

On Tue, Jan 26, 2010 at 12:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "David E. Wheeler" <david@kineticode.com> writes:
>> Because it's an aggregate that cocatenates values. It's not an
>> aggregate that lists things. I also like concat_agg better than
>> string_agg because it's not limited to acting on strings.
>
> But what it *produces* is a string.  For comparison, the
> SQL-standard-specified array_agg produces arrays, but what it
> acts on isn't an array.

This point is well-taken, but naming it string_agg() because it
produces a string doesn't seem quite descriptive enough.  We might
someday (if we don't already) have a number of aggregates that produce
an output that is a string; we can't name them all by the output type.

...Robert

Re: Review: listagg aggregate

From

Tom Lane

Date:

26 January 2010, 14:38:53

Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jan 26, 2010 at 12:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> But what it *produces* is a string. �For comparison, the
>> SQL-standard-specified array_agg produces arrays, but what it
>> acts on isn't an array.

> This point is well-taken, but naming it string_agg() because it
> produces a string doesn't seem quite descriptive enough.  We might
> someday (if we don't already) have a number of aggregates that produce
> an output that is a string; we can't name them all by the output type.

True, but the same point could be made against array_agg, and that
didn't stop the committee from choosing that name.  As long as
string_agg is the "most obvious" aggregate-to-string functionality,
which ISTM it is, I think it's all right for it to have pride of place
in naming.
        regards, tom lane

Re: testing cvs HEAD - HS/SR - missing file

From

Fujii Masao

Date:

26 January 2010, 15:03:05

On Wed, Jan 27, 2010 at 2:19 AM, Erik Rijkers <er@xs4all.nl> wrote:
> Now, when restoring a 700MB dump (made with a 9.0devel pg_dump + pg_restore) into the primary,
> errors like the following occur, on all three instances:
>
> FATAL:  could not open file "pg_xlog/0000000100000001000000FF" (log file 1, segment 255): No such
> file or directory

Thanks for the report!

This is the bug of SR :( I think that walsender wrongly treats the WAL-boundary.

> pgsql.sr_slave02/logfile:ERROR:  could not read xlog records: FATAL:  could not open file
> "pg_xlog/0000000100000001000000FF" (log file 1, segment 255): No such file or directory
> pgsql.sr_slavery/logfile:ERROR:  could not read xlog records: FATAL:  could not open file
> "pg_xlog/0000000100000001000000FF" (log file 1, segment 255): No such file or directory

Also the ReadRecord() or its surrounding functions seem to have treated
wrongly the WAL-boundary.

I'll fix those bugs after a night's sleep.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: testing cvs HEAD - HS/SR - missing file

From

Fujii Masao

Date:

26 January 2010, 23:46:45

On Wed, Jan 27, 2010 at 4:02 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> This is the bug of SR :( I think that walsender wrongly treats the WAL-boundary.

The attached patch would fix this bug.

>> pgsql.sr_slave02/logfile:ERROR:  could not read xlog records: FATAL:  could not open file
>> "pg_xlog/0000000100000001000000FF" (log file 1, segment 255): No such file or directory
>> pgsql.sr_slavery/logfile:ERROR:  could not read xlog records: FATAL:  could not open file
>> "pg_xlog/0000000100000001000000FF" (log file 1, segment 255): No such file or directory
>
> Also the ReadRecord() or its surrounding functions seem to have treated
> wrongly the WAL-boundary.

Oops! I've misread the log messages. This is not the bug.
Because of the FATAL error from the primary, the standby
seems to just emit an ERROR message, and exit.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

sr_xlogid_boundary.patch

Re: testing cvs HEAD - HS/SR - missing file

From

Heikki Linnakangas

Date:

27 January 2010, 06:06:11

Fujii Masao wrote:
> *** a/src/backend/replication/walsender.c
> --- b/src/backend/replication/walsender.c
> ***************
> *** 661,666 **** XLogSend(StringInfo outMsg)
> --- 661,673 ----
>   
>           sentPtr = endptr;
>   
> +         if (sentPtr.xrecoff >= XLogFileSize)
> +         {
> +             /* crossing a logid boundary */
> +             sentPtr.xlogid += 1;
> +             sentPtr.xrecoff = 0;
> +         }
> + 
>           /*
>            * Read the log directly into the output buffer to prevent
>            * extra memcpy calls.
> 

Before that, endptr is advanced using XLByteAdvance() macro, which does
handle xlogid boundaries. Is XLByteAdvance() broken?

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com

Re: testing cvs HEAD - HS/SR - missing file

From

Fujii Masao

Date:

27 January 2010, 06:46:50

On Wed, Jan 27, 2010 at 7:05 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Before that, endptr is advanced using XLByteAdvance() macro, which does
> handle xlogid boundaries. Is XLByteAdvance() broken?

No. The cause of the bug is that endptr might be set to the SendRqstPtr
that has crossed a xlogid boundary in the following code.

>        /* if we went beyond SendRqstPtr, back off */
>        if (XLByteLT(SendRqstPtr, endptr))
>            endptr = SendRqstPtr;

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: testing cvs HEAD - HS/SR - missing file

From

Heikki Linnakangas

Date:

27 January 2010, 07:38:01

Fujii Masao wrote:
> On Wed, Jan 27, 2010 at 7:05 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> Before that, endptr is advanced using XLByteAdvance() macro, which does
>> handle xlogid boundaries. Is XLByteAdvance() broken?
> 
> No. The cause of the bug is that endptr might be set to the SendRqstPtr
> that has crossed a xlogid boundary in the following code.
> 
>>         /* if we went beyond SendRqstPtr, back off */
>>         if (XLByteLT(SendRqstPtr, endptr))
>>             endptr = SendRqstPtr;

But SendRqstPtr comes from LogwrtResult.Write, surely that's correct, no?

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com

Re: testing cvs HEAD - HS/SR - missing file

From

Fujii Masao

Date:

27 January 2010, 09:12:10

On Wed, Jan 27, 2010 at 8:37 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> But SendRqstPtr comes from LogwrtResult.Write, surely that's correct, no?

Right. But the point is that LogwrtResult.Write might indicate 0/FF000000
because it's the last byte + 1 written out. XLogRead() treats this as the
location where WAL begins to be read, so converts it to the WAL file name
0000000100000000000000FF by using XLByteToSeg.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Review: listagg aggregate

From

Pavel Stehule

Date:

27 January 2010, 11:58:31

with actualised oids

Regards
Pavel Stehule

2010/1/26 David E. Wheeler <david@kineticode.com>:
> On Jan 25, 2010, at 6:56 AM, Pavel Stehule wrote:
>
>> actualised patch - the name is string_agg
>
> All looks fine except I'm getting this error during initdb:
>
> creating template1 database in /usr/local/pgsql-devel/data/base/1 ... FATAL:  could not create unique index
"pg_proc_oid_index"
> DETAIL:  Key (oid)=(3031) is duplicated.
> child process exited with exit code 1
>
> Would you mind re-submitting with unique OIDs?
>
> Thanks,
>
> David

Attachment

string_agg.diff

Re: testing cvs HEAD - HS/SR - missing file

From

Heikki Linnakangas

Date:

27 January 2010, 12:38:20

Fujii Masao wrote:
> On Wed, Jan 27, 2010 at 8:37 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> But SendRqstPtr comes from LogwrtResult.Write, surely that's correct, no?
> 
> Right. But the point is that LogwrtResult.Write might indicate 0/FF000000
> because it's the last byte + 1 written out. XLogRead() treats this as the
> location where WAL begins to be read, so converts it to the WAL file name
> 0000000100000000000000FF by using XLByteToSeg.

Ah, I see it now, thanks. How confusing.

Your patch clearly works, but I wonder how we should logically treat
that boundary value. You're setting sentPtr to the beginning of the
logid, when LogwrtResult.Write is 0/FF000000. So you're logically
thinking that the non-existent FF log segment has been sent to the
standby as soon as the previous segment has been fully sent. Another
option would be to set SendRqstPtr to the beginning of next logid, when
LogwrtResult.Write is 0/FF000000.. Yet another option would be to set
startptr to the beginning of next logid, when sentPtr is 0/FF000000:

***************
*** 633,638 ****
--- 633,648 ----          * WAL record.          */         startptr = sentPtr;
+         if (startptr.xrecoff >= XLogFileSize)
+         {
+             /*
+              * crossing a logid boundary, skip the non-existent last log
+              * segment in previous logical log file.
+              */
+             startptr.xlogid += 1;
+             startptr.xrecoff = 0;
+         }
+         endptr = startptr;         XLByteAdvance(endptr, MAX_SEND_SIZE);         /* round down to page boundary. */

I think I prefer that, it feels logically more correct. Setting sentPtr
beyond LogwrtResult.Write feels wrong, even though 0/FF000000 and
1/00000000 are really the same physical location. I'll commit it that way.

I also note that I broke that same thing in ReadRecord() in the
retry-logic patch I just comitted. I'll fix that too.

Thanks for the testing, Erik!

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com

Re: Review: listagg aggregate

From

Robert Haas

Date:

27 January 2010, 18:09:27

On Tue, Jan 26, 2010 at 1:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Jan 26, 2010 at 12:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> But what it *produces* is a string.  For comparison, the
>>> SQL-standard-specified array_agg produces arrays, but what it
>>> acts on isn't an array.
>
>> This point is well-taken, but naming it string_agg() because it
>> produces a string doesn't seem quite descriptive enough.  We might
>> someday (if we don't already) have a number of aggregates that produce
>> an output that is a string; we can't name them all by the output type.
>
> True, but the same point could be made against array_agg, and that
> didn't stop the committee from choosing that name.  As long as
> string_agg is the "most obvious" aggregate-to-string functionality,
> which ISTM it is, I think it's all right for it to have pride of place
> in naming.

Maybe so, but personally, I'd still prefer something more descriptive.

...Robert

Re: testing cvs HEAD - HS/SR - missing file

From

"Erik Rijkers"

Date:

27 January 2010, 20:33:15

On Wed, January 27, 2010 17:38, Heikki Linnakangas wrote:

> I'll commit it that way.

This is just to let you know that these commits seem to have solved the bug.

I repeated the original test (700 MB pg_restore into a primary,
with two replicating slaves, on one machine).

Checking afterwards:
$ for port in 6565 6566 6567;
do psql -qtAp $port -c "select pg_database_size('${PGDATABASE}'), version()";
done
17057193792|PostgreSQL 9.0devel-sr_primary [...]
17057193792|PostgreSQL 9.0devel-sr_slavery
17057193792|PostgreSQL 9.0devel-sr_slave02

So that looked promising :)  To make sure I also COPYd
all tables to file and diff'ed them. There were indeed
no differences.




I did get at one point in the sr_slave02 log: (consecutive lines)

LOG:  consistent recovery state reached at 0/2000000
LOG:  database system is ready to accept read only connections
LOG:  could not send data to client: Broken pipe
LOG:  unexpected EOF on client connection

but apparently this is not problematic?



HS/SR is a fantastic set of features.  I'll keep hammering away a bit at the dynamic duo; if you
have specific testing ideas, let me know.

Thanks!


Erik Rijkers

Re: Review: listagg aggregate

From

"David E. Wheeler"

Date:

27 January 2010, 20:45:21

On Jan 27, 2010, at 7:58 AM, Pavel Stehule wrote:

> with actualised oids

Thanks. Looks good, modulo my preference for concat_agg(). I'll mark it ready for committer.

Best,

David

Re: testing cvs HEAD - HS/SR - missing file

From

Fujii Masao

Date:

27 January 2010, 21:09:29

On Thu, Jan 28, 2010 at 9:32 AM, Erik Rijkers <er@xs4all.nl> wrote:
> On Wed, January 27, 2010 17:38, Heikki Linnakangas wrote:
>
>> I'll commit it that way.

Thanks Heikki! Yeah, your approach makes more sense.

> This is just to let you know that these commits seem to have solved the bug.

Good! Thanks Erik!

> I did get at one point in the sr_slave02 log: (consecutive lines)
>
> LOG:  consistent recovery state reached at 0/2000000
> LOG:  database system is ready to accept read only connections
> LOG:  could not send data to client: Broken pipe
> LOG:  unexpected EOF on client connection
>
> but apparently this is not problematic?

It seems to me that you've unexpectedly killed (e.g., kill -9)
a client connected to the standby during executing a read-only
query.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Review: listagg aggregate

From

Takahiro Itagaki

Date:

27 January 2010, 22:46:08

Pavel Stehule <pavel.stehule@gmail.com> wrote:

> with actualised oids

I'm checking the patch for commit, and have a couple of comments.

* I think we cannot cache the delimiter at the first call.
  For example,
    SELECT string_agg(elem, delim)
      FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim);
  should return 'A+B*C' rather than 'A,B,C'.

* Can we use StringInfo directly as the aggregate context instead of
  StringAggState? For the first reason, we need to drop 'delimiter' field
  from struct StringAggState. Now it has only StringInfo field.

* We'd better avoiding to call text_to_cstring() for delimitors and elements
  for performance reason. We can use appendBinaryStringInfo() here.

My proposal patch attached.

Also, I've not changed it yet, but it might be considerable:

* Do we need better names for string_agg1_transfn and string_agg2_transfn?
  They are almost "internal names", but we could have more
  like string_agg_with_sep_transfn.

Comments?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Attachment

string_agg_20100128.patch

Re: Review: listagg aggregate

From

"David E. Wheeler"

Date:

27 January 2010, 23:28:45

On Jan 27, 2010, at 6:47 PM, Takahiro Itagaki wrote:

> * I think we cannot cache the delimiter at the first call.
>  For example,
>    SELECT string_agg(elem, delim)
>      FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim);
>  should return 'A+B*C' rather than 'A,B,C'.

Ooh, nice.

> * Can we use StringInfo directly as the aggregate context instead of
>  StringAggState? For the first reason, we need to drop 'delimiter' field
>  from struct StringAggState. Now it has only StringInfo field.

Makes sense.

> * We'd better avoiding to call text_to_cstring() for delimitors and elements
>  for performance reason. We can use appendBinaryStringInfo() here.
>
> My proposal patch attached.
>
> Also, I've not changed it yet, but it might be considerable:
>
> * Do we need better names for string_agg1_transfn and string_agg2_transfn?
>  They are almost "internal names", but we could have more
>  like string_agg_with_sep_transfn.

Yes please.

> Comments?

Patch looks great, thank you!

David

Re: Review: listagg aggregate

From

Pavel Stehule

Date:

28 January 2010, 04:37:58

2010/1/28 Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp>:
>
> Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
>> with actualised oids
>
> I'm checking the patch for commit, and have a couple of comments.
>
> * I think we cannot cache the delimiter at the first call.
>  For example,
>    SELECT string_agg(elem, delim)
>      FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim);
>  should return 'A+B*C' rather than 'A,B,C'.

no I dislike it. This using is nonsense.

Regards
Pavel

>
> * Can we use StringInfo directly as the aggregate context instead of
>  StringAggState? For the first reason, we need to drop 'delimiter' field
>  from struct StringAggState. Now it has only StringInfo field.
>
> * We'd better avoiding to call text_to_cstring() for delimitors and elements
>  for performance reason. We can use appendBinaryStringInfo() here.
>
> My proposal patch attached.
>
> Also, I've not changed it yet, but it might be considerable:
>
> * Do we need better names for string_agg1_transfn and string_agg2_transfn?
>  They are almost "internal names", but we could have more
>  like string_agg_with_sep_transfn.
>
> Comments?
>
> Regards,
> ---
> Takahiro Itagaki
> NTT Open Source Software Center
>
>

Re: Review: listagg aggregate

From

Pavel Stehule

Date:

28 January 2010, 04:38:32

2010/1/28 David E. Wheeler <david@kineticode.com>:
> On Jan 27, 2010, at 6:47 PM, Takahiro Itagaki wrote:
>
>> * I think we cannot cache the delimiter at the first call.
>>  For example,
>>    SELECT string_agg(elem, delim)
>>      FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim);
>>  should return 'A+B*C' rather than 'A,B,C'.
>
> Ooh, nice.
>
>> * Can we use StringInfo directly as the aggregate context instead of
>>  StringAggState? For the first reason, we need to drop 'delimiter' field
>>  from struct StringAggState. Now it has only StringInfo field.
>
> Makes sense.

no, has not.

Pavel

>
>> * We'd better avoiding to call text_to_cstring() for delimitors and elements
>>  for performance reason. We can use appendBinaryStringInfo() here.
>>
>> My proposal patch attached.
>>
>> Also, I've not changed it yet, but it might be considerable:
>>
>> * Do we need better names for string_agg1_transfn and string_agg2_transfn?
>>  They are almost "internal names", but we could have more
>>  like string_agg_with_sep_transfn.
>
> Yes please.
>
>> Comments?
>
> Patch looks great, thank you!
>
> David
>
>
>

Re: Review: listagg aggregate

From

Pavel Stehule

Date:

28 January 2010, 04:40:14

2010/1/28 Pavel Stehule <pavel.stehule@gmail.com>:
> 2010/1/28 David E. Wheeler <david@kineticode.com>:
>> On Jan 27, 2010, at 6:47 PM, Takahiro Itagaki wrote:
>>
>>> * I think we cannot cache the delimiter at the first call.
>>>  For example,
>>>    SELECT string_agg(elem, delim)
>>>      FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim);
>>>  should return 'A+B*C' rather than 'A,B,C'.
>>
>> Ooh, nice.
>>
>>> * Can we use StringInfo directly as the aggregate context instead of
>>>  StringAggState? For the first reason, we need to drop 'delimiter' field
>>>  from struct StringAggState. Now it has only StringInfo field.
>>
>> Makes sense.
>
> no, has not.

What is use case for this behave??

Pavel


>
> Pavel
>
>>
>>> * We'd better avoiding to call text_to_cstring() for delimitors and elements
>>>  for performance reason. We can use appendBinaryStringInfo() here.
>>>
>>> My proposal patch attached.
>>>
>>> Also, I've not changed it yet, but it might be considerable:
>>>
>>> * Do we need better names for string_agg1_transfn and string_agg2_transfn?
>>>  They are almost "internal names", but we could have more
>>>  like string_agg_with_sep_transfn.
>>
>> Yes please.
>>
>>> Comments?
>>
>> Patch looks great, thank you!
>>
>> David
>>
>>
>>
>

Re: Review: listagg aggregate

From

Takahiro Itagaki

Date:

28 January 2010, 04:58:10

Pavel Stehule <pavel.stehule@gmail.com> wrote:

> 2010/1/28 Pavel Stehule <pavel.stehule@gmail.com>:
> > 2010/1/28 David E. Wheeler <david@kineticode.com>:
> >> On Jan 27, 2010, at 6:47 PM, Takahiro Itagaki wrote:
> >>
> >>> * I think we cannot cache the delimiter at the first call.
> >>>   For example,
> >>>     SELECT string_agg(elem, delim)
> >>>       FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim);
> >>>   should return 'A+B*C' rather than 'A,B,C'.
> >
> > no, has not.
> What is use case for this behave??

I also think this usage is nonsense, but seems to be the most consistent
behavior for me. I didn't say anything about use-cases, but just capability.
Since we allow such kinds of usage for now, you need to verify the
delimiter is not changed rather than ignoring it if you want disallow
to change the delimiter during an aggregation.

Of course you can cache the first delimiter at start, and check delimiters
are not changed every calls -- but I think it is just a waste of cpu cycle.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Re: Review: listagg aggregate

From

Robert Haas

Date:

28 January 2010, 09:53:43

On Thu, Jan 28, 2010 at 3:59 AM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:
> Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
>> 2010/1/28 Pavel Stehule <pavel.stehule@gmail.com>:
>> > 2010/1/28 David E. Wheeler <david@kineticode.com>:
>> >> On Jan 27, 2010, at 6:47 PM, Takahiro Itagaki wrote:
>> >>
>> >>> * I think we cannot cache the delimiter at the first call.
>> >>>   For example,
>> >>>     SELECT string_agg(elem, delim)
>> >>>       FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim);
>> >>>   should return 'A+B*C' rather than 'A,B,C'.
>> >
>> > no, has not.
>> What is use case for this behave??
>
> I also think this usage is nonsense, but seems to be the most consistent
> behavior for me. I didn't say anything about use-cases, but just capability.
> Since we allow such kinds of usage for now, you need to verify the
> delimiter is not changed rather than ignoring it if you want disallow
> to change the delimiter during an aggregation.
>
> Of course you can cache the first delimiter at start, and check delimiters
> are not changed every calls -- but I think it is just a waste of cpu cycle.

Agreed.  Not caching it seems the simplest solution.

...Robert

Re: Review: listagg aggregate

From

Pavel Stehule

Date:

28 January 2010, 10:01:35

2010/1/28 Robert Haas <robertmhaas@gmail.com>:
> On Thu, Jan 28, 2010 at 3:59 AM, Takahiro Itagaki
> <itagaki.takahiro@oss.ntt.co.jp> wrote:
>> Pavel Stehule <pavel.stehule@gmail.com> wrote:
>>
>>> 2010/1/28 Pavel Stehule <pavel.stehule@gmail.com>:
>>> > 2010/1/28 David E. Wheeler <david@kineticode.com>:
>>> >> On Jan 27, 2010, at 6:47 PM, Takahiro Itagaki wrote:
>>> >>
>>> >>> * I think we cannot cache the delimiter at the first call.
>>> >>>   For example,
>>> >>>     SELECT string_agg(elem, delim)
>>> >>>       FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim);
>>> >>>   should return 'A+B*C' rather than 'A,B,C'.
>>> >
>>> > no, has not.
>>> What is use case for this behave??
>>
>> I also think this usage is nonsense, but seems to be the most consistent
>> behavior for me. I didn't say anything about use-cases, but just capability.
>> Since we allow such kinds of usage for now, you need to verify the
>> delimiter is not changed rather than ignoring it if you want disallow
>> to change the delimiter during an aggregation.
>>
>> Of course you can cache the first delimiter at start, and check delimiters
>> are not changed every calls -- but I think it is just a waste of cpu cycle.
>
> Agreed.  Not caching it seems the simplest solution.

simplest could not be a best. There have to be only a const
expression. But we have not possibility to check it in pg.

Pavel



>
> ...Robert
>

Re: Review: listagg aggregate

From

Robert Haas

Date:

28 January 2010, 10:17:14

On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> simplest could not be a best. There have to be only a const
> expression. But we have not possibility to check it in pg.

Well... that's an entirely arbitrary limitation.  I admit that it
doesn't seem likely that someone would want to have a variable
delimiter, but putting extra effort and code complexity into
preventing it seems pointless.

...Robert

Re: Review: listagg aggregate

From

Pavel Stehule

Date:

28 January 2010, 11:32:47

2010/1/28 Robert Haas <robertmhaas@gmail.com>:
> On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> simplest could not be a best. There have to be only a const
>> expression. But we have not possibility to check it in pg.
>
> Well... that's an entirely arbitrary limitation.  I admit that it
> doesn't seem likely that someone would want to have a variable
> delimiter, but putting extra effort and code complexity into
> preventing it seems pointless.

It is only a few lines with zero complexity.

The main issue of Takahiro proposal is  "unclean" behave.

we can have a content

c1    c2
-----------
c11, c12,
c21, c22

and result of string_agg(c1, c2)

have to be ?? c11 c12 c21 or c11 c22 c21 ?? What if some content of c2
will be NULL ?? I checked oracle. Oracle doesn't allow variable as
delimiter. We can't check it. But we can fix first value and using it
as constant.

More - Takahiro proposal has one performance gain. It have to deTOAST
separator value in every cycle.

Takahiro has true - we can store length of separator and we can add
separator to cumulative value as binary value.

I prefer original behave with  note in documentation - so as separator
is used a value on first row, when is used expression and not
constant.

Regards
Pavel

>
> ...Robert
>

Re: Review: listagg aggregate

From

Tom Lane

Date:

28 January 2010, 11:57:20

Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> simplest could not be a best. There have to be only a const
>> expression. But we have not possibility to check it in pg.

> Well... that's an entirely arbitrary limitation.  I admit that it
> doesn't seem likely that someone would want to have a variable
> delimiter, but putting extra effort and code complexity into
> preventing it seems pointless.

Yeah.  The real issue here is that in some cases you'd like to have
non-aggregated parameters to an aggregate, but SQL has no notation
to express that.

I think Pavel's underlying complaint is that if the delimiter
argument isn't constant, then we're exposing an implementation
dependency in terms of just which values get separated by which
delimiters.  The most practical implementation seems to be that
the first-call delimiter isn't actually used at all, and on
subsequent calls the delimiter *precedes* the associated value,
which is a bit surprising given the order in which one writes
them.  Not sure if this is worth documenting though.  Those two
or three people who actually try it will figure it out soon enough.
        regards, tom lane

Re: Review: listagg aggregate

From

Robert Haas

Date:

28 January 2010, 12:01:37

On Thu, Jan 28, 2010 at 10:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>>> simplest could not be a best. There have to be only a const
>>> expression. But we have not possibility to check it in pg.
>
>> Well... that's an entirely arbitrary limitation.  I admit that it
>> doesn't seem likely that someone would want to have a variable
>> delimiter, but putting extra effort and code complexity into
>> preventing it seems pointless.
>
> Yeah.  The real issue here is that in some cases you'd like to have
> non-aggregated parameters to an aggregate, but SQL has no notation
> to express that.

Right.

> I think Pavel's underlying complaint is that if the delimiter
> argument isn't constant, then we're exposing an implementation
> dependency in terms of just which values get separated by which
> delimiters.  The most practical implementation seems to be that
> the first-call delimiter isn't actually used at all, and on
> subsequent calls the delimiter *precedes* the associated value,
> which is a bit surprising given the order in which one writes
> them.  Not sure if this is worth documenting though.  Those two
> or three people who actually try it will figure it out soon enough.

Yeah, I'm thoroughly unworried about it.

...Robert

Re: Review: listagg aggregate

From

Robert Haas

Date:

28 January 2010, 12:02:35

On Wed, Jan 27, 2010 at 9:47 PM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:
> * Do we need better names for string_agg1_transfn and string_agg2_transfn?
>  They are almost "internal names", but we could have more
>  like string_agg_with_sep_transfn.

Surely the names of the transition and final functions should match
the name of the aggregate.  But if there's a proposal on the table to
call this string_app_with_sep() instead of just string_agg(), +1 from
me.

...Robert

Re: Review: listagg aggregate

From

Hitoshi Harada

Date:

28 January 2010, 13:08:54

2010/1/29 Pavel Stehule <pavel.stehule@gmail.com>:
> 2010/1/28 Robert Haas <robertmhaas@gmail.com>:
>> On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>>> simplest could not be a best. There have to be only a const
>>> expression. But we have not possibility to check it in pg.
>>
>> Well... that's an entirely arbitrary limitation.  I admit that it
>> doesn't seem likely that someone would want to have a variable
>> delimiter, but putting extra effort and code complexity into
>> preventing it seems pointless.
>
> It is only a few lines with zero complexity.
>
> The main issue of Takahiro proposal is  "unclean" behave.
>
> we can have a content
>
> c1    c2
> -----------
> c11, c12,
> c21, c22
>
> and result of string_agg(c1, c2)
>
> have to be ?? c11 c12 c21 or c11 c22 c21 ?? What if some content of c2
> will be NULL ?? I checked oracle. Oracle doesn't allow variable as
> delimiter. We can't check it. But we can fix first value and using it
> as constant.

What about get_fn_expr_arg_stable() to check if the argument is stable
during aggregate?

Regards,


--
Hitoshi Harada

Re: Review: listagg aggregate

From

Pavel Stehule

Date:

28 January 2010, 13:13:03

2010/1/28 Robert Haas <robertmhaas@gmail.com>:
> On Thu, Jan 28, 2010 at 10:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>>>> simplest could not be a best. There have to be only a const
>>>> expression. But we have not possibility to check it in pg.
>>
>>> Well... that's an entirely arbitrary limitation.  I admit that it
>>> doesn't seem likely that someone would want to have a variable
>>> delimiter, but putting extra effort and code complexity into
>>> preventing it seems pointless.
>>
>> Yeah.  The real issue here is that in some cases you'd like to have
>> non-aggregated parameters to an aggregate, but SQL has no notation
>> to express that.
>
> Right.
>
>> I think Pavel's underlying complaint is that if the delimiter
>> argument isn't constant, then we're exposing an implementation
>> dependency in terms of just which values get separated by which
>> delimiters.  The most practical implementation seems to be that
>> the first-call delimiter isn't actually used at all, and on
>> subsequent calls the delimiter *precedes* the associated value,
>> which is a bit surprising given the order in which one writes
>> them.  Not sure if this is worth documenting though.  Those two
>> or three people who actually try it will figure it out soon enough.
>

ok - there is one query,

in 99.99% the second argument will be a constant. Can we use this
information and optimize function for this case?

The detoast on every row can take some percent from a performance.

Pavel

> Yeah, I'm thoroughly unworried about it.


>
> ...Robert
>

Re: Review: listagg aggregate

From

Pavel Stehule

Date:

28 January 2010, 13:14:49

2010/1/28 Robert Haas <robertmhaas@gmail.com>:
> On Wed, Jan 27, 2010 at 9:47 PM, Takahiro Itagaki
> <itagaki.takahiro@oss.ntt.co.jp> wrote:
>> * Do we need better names for string_agg1_transfn and string_agg2_transfn?
>>  They are almost "internal names", but we could have more
>>  like string_agg_with_sep_transfn.
>
> Surely the names of the transition and final functions should match
> the name of the aggregate.  But if there's a proposal on the table to
> call this string_app_with_sep() instead of just string_agg(), +1 from
> me.

no, >>>string_app_with_sep<<< is too long for common using.

Pavel

>
> ...Robert
>

Re: Review: listagg aggregate

From

Tom Lane

Date:

28 January 2010, 13:17:35

Pavel Stehule <pavel.stehule@gmail.com> writes:
> in 99.99% the second argument will be a constant. Can we use this
> information and optimize function for this case?

> The detoast on every row can take some percent from a performance.

What detoast?  There won't be one for a constant, nor even for a
variable in any sane situation --- who's going to be using
multi-kilobyte delimiter values?  And if they do, aren't they likely
to run out of memory for the result long before the repeated detoasts
become an interesting problem?  You're arguing about a case that
seems quite irrelevant to the real world.
        regards, tom lane

Re: Review: listagg aggregate

From

Pavel Stehule

Date:

28 January 2010, 13:18:23

2010/1/28 Hitoshi Harada <umi.tanuki@gmail.com>:
> 2010/1/29 Pavel Stehule <pavel.stehule@gmail.com>:
>> 2010/1/28 Robert Haas <robertmhaas@gmail.com>:
>>> On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>>>> simplest could not be a best. There have to be only a const
>>>> expression. But we have not possibility to check it in pg.
>>>
>>> Well... that's an entirely arbitrary limitation.  I admit that it
>>> doesn't seem likely that someone would want to have a variable
>>> delimiter, but putting extra effort and code complexity into
>>> preventing it seems pointless.
>>
>> It is only a few lines with zero complexity.
>>
>> The main issue of Takahiro proposal is  "unclean" behave.
>>
>> we can have a content
>>
>> c1    c2
>> -----------
>> c11, c12,
>> c21, c22
>>
>> and result of string_agg(c1, c2)
>>
>> have to be ?? c11 c12 c21 or c11 c22 c21 ?? What if some content of c2
>> will be NULL ?? I checked oracle. Oracle doesn't allow variable as
>> delimiter. We can't check it. But we can fix first value and using it
>> as constant.
>
> What about get_fn_expr_arg_stable() to check if the argument is stable
> during aggregate?

I newer know so this function exists. Now we can

a) check and allow only stable params
b) when second parameter is stable, then store it and use it as constant.

I prefer a)

Pavel

>
> Regards,
>
>
> --
> Hitoshi Harada
>

Re: Review: listagg aggregate

From

Pavel Stehule

Date:

28 January 2010, 13:20:12

2010/1/28 Tom Lane <tgl@sss.pgh.pa.us>:
> Pavel Stehule <pavel.stehule@gmail.com> writes:
>> in 99.99% the second argument will be a constant. Can we use this
>> information and optimize function for this case?
>
>> The detoast on every row can take some percent from a performance.
>
> What detoast?  There won't be one for a constant, nor even for a
> variable in any sane situation --- who's going to be using
> multi-kilobyte delimiter values?  And if they do, aren't they likely
> to run out of memory for the result long before the repeated detoasts
> become an interesting problem?  You're arguing about a case that
> seems quite irrelevant to the real world.
>

I asking

with get_fn_expr_arg_stable() we are able to fix second parameter
without some performance issues.

Regards
Pavel


>                        regards, tom lane
>

Re: Review: listagg aggregate

From

Tom Lane

Date:

28 January 2010, 13:21:50

Hitoshi Harada <umi.tanuki@gmail.com> writes:
> What about get_fn_expr_arg_stable() to check if the argument is stable
> during aggregate?

Seems quite expensive (possible catalog lookups) and there still isn't
any strong argument why we should bother.
        regards, tom lane

Re: Review: listagg aggregate

From

Tom Lane

Date:

28 January 2010, 13:25:07

Pavel Stehule <pavel.stehule@gmail.com> writes:
> with get_fn_expr_arg_stable() we are able to fix second parameter
> without some performance issues.

No, that will create its own performance issues ---
get_fn_expr_arg_stable isn't especially cheap.
If there were a really strong reason why we had to do it,
then I'd agree, but frankly the argument for disallowing
a variable delimiter is too thin.
        regards, tom lane

Re: Review: listagg aggregate

From

Marko Tiikkaja

Date:

28 January 2010, 13:29:22

On 2010-01-28 19:17, Pavel Stehule wrote:
> 2010/1/28 Hitoshi Harada <umi.tanuki@gmail.com>:
>> What about get_fn_expr_arg_stable() to check if the argument is stable
>> during aggregate?
> 
> I newer know so this function exists. Now we can
> 
> a) check and allow only stable params
> b) when second parameter is stable, then store it and use it as constant.
> 
> I prefer a)

Someone might have a perfectly good use case for using different
delimiters.  I don't think it's a good idea to be artificially limiting
what you can and can't do.


Regards,
Marko Tiikkaja

Re: Review: listagg aggregate

From

"David E. Wheeler"

Date:

28 January 2010, 13:30:44

On Jan 28, 2010, at 9:29 AM, Marko Tiikkaja wrote:

> Someone might have a perfectly good use case for using different
> delimiters.  I don't think it's a good idea to be artificially limiting
> what you can and can't do.

+1

David

Re: Review: listagg aggregate

From

Greg Stark

Date:

28 January 2010, 14:58:04

<p>One situation where this could actually matter in the long term is if we want to have an optimization for aggregate
functionswhose state variables can be combined. this could be important if we ever want to do parallel processing
someday.<p>Sowe could have, for example two subjobs build two sublists and then combiner the two lists later. in that
casethe separator might not be the one the user expected - our put another way the one the user expected might not be
availablewhen we need it.<p>We could say this isn't a problem because not all aggregate functions will be amenable to
suchan optimization and perhaps this will just be one of them. Alternately we could just have faith that a solution
willbe found - it doesn't seem like it should be an insoluble problem to me.<p>greg<p><blockquote type="cite">On 28 Jan
201015:57, "Tom Lane" <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:<br /><br /><p><font
color="#500050">RobertHaas <<a href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a>> writes:<br /> >
OnThu, Jan 28, 2010 at 9:01 AM, Pavel Stehule <pavel....</font>Yeah.  The real issue here is that in some cases
you'dlike to have<br /> non-aggregated parameters to an aggregate, but SQL has no notation<br /> to express that.<br
/><br/> I think Pavel's underlying complaint is that if the delimiter<br /> argument isn't constant, then we're
exposingan implementation<br /> dependency in terms of just which values get separated by which<br /> delimiters.  The
mostpractical implementation seems to be that<br /> the first-call delimiter isn't actually used at all, and on<br />
subsequentcalls the delimiter *precedes* the associated value,<br /> which is a bit surprising given the order in which
onewrites<br /> them.  Not sure if this is worth documenting though.  Those two<br /> or three people who actually try
itwill figure it out soon enough.<br /><br />                        regards, tom lane<br /><p><font
color="#500050"><br/>-- <br />Sent via pgsql-hackers mailing list (<a
href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br/>To make changes to your
subs...</font></blockquote>

Re: Review: listagg aggregate

From

Pavel Stehule

Date:

29 January 2010, 03:44:01

2010/1/28 Tom Lane <tgl@sss.pgh.pa.us>:
> Pavel Stehule <pavel.stehule@gmail.com> writes:
>> with get_fn_expr_arg_stable() we are able to fix second parameter
>> without some performance issues.
>
> No, that will create its own performance issues ---
> get_fn_expr_arg_stable isn't especially cheap.
> If there were a really strong reason why we had to do it,
> then I'd agree, but frankly the argument for disallowing
> a variable delimiter is too thin.

it could be called only once. But I agree, so could be better, check
it in parser time.

ok - I am only one who like original behave - so I  am withdrawing.
Robert, If you like, please commit Itagaki's patch. + add note about
behave when second parameter isn't stable.

Regards
Pavel Stehule

>
>                        regards, tom lane
>

Re: Review: listagg aggregate

From

Robert Haas

Date:

29 January 2010, 14:43:17

On Fri, Jan 29, 2010 at 2:43 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2010/1/28 Tom Lane <tgl@sss.pgh.pa.us>:
>> Pavel Stehule <pavel.stehule@gmail.com> writes:
>>> with get_fn_expr_arg_stable() we are able to fix second parameter
>>> without some performance issues.
>>
>> No, that will create its own performance issues ---
>> get_fn_expr_arg_stable isn't especially cheap.
>> If there were a really strong reason why we had to do it,
>> then I'd agree, but frankly the argument for disallowing
>> a variable delimiter is too thin.
>
> it could be called only once. But I agree, so could be better, check
> it in parser time.
>
> ok - I am only one who like original behave - so I  am withdrawing.
> Robert, If you like, please commit Itagaki's patch. + add note about
> behave when second parameter isn't stable.

I haven't even looked at this code - I sort of assumed Itagaki was
handling this one.  But it might be good to make sure that the docs
have been read through by a native English speaker prior to commit...

...Robert

Re: Review: listagg aggregate

From

"David E. Wheeler"

Date:

29 January 2010, 14:45:22

On Jan 29, 2010, at 10:43 AM, Robert Haas wrote:

> I haven't even looked at this code - I sort of assumed Itagaki was
> handling this one.  But it might be good to make sure that the docs
> have been read through by a native English speaker prior to commit...

I did and revised them slightly. There isn't much, just a brief comment in the table of aggregate functions. The
documentationfor all the functions on that page could use a little love, frankly. 

Best,

David

Re: Review: listagg aggregate

From

Robert Haas

Date:

29 January 2010, 14:47:03

On Fri, Jan 29, 2010 at 1:45 PM, David E. Wheeler <david@kineticode.com> wrote:
> On Jan 29, 2010, at 10:43 AM, Robert Haas wrote:
>
>> I haven't even looked at this code - I sort of assumed Itagaki was
>> handling this one.  But it might be good to make sure that the docs
>> have been read through by a native English speaker prior to commit...
>
> I did and revised them slightly. There isn't much, just a brief comment in the table of aggregate functions. The
documentationfor all the functions on that page could use a little love, frankly. 

Want to take a short at it?

...Robert

Re: Review: listagg aggregate

From

"David E. Wheeler"

Date:

29 January 2010, 14:47:55

On Jan 29, 2010, at 10:46 AM, Robert Haas wrote:

>> I did and revised them slightly. There isn't much, just a brief comment in the table of aggregate functions. The
documentationfor all the functions on that page could use a little love, frankly. 
>
> Want to take a short at it?

ENOTUITS! /me is already sorely over-committed…

David

Re: Review: listagg aggregate

From

Takahiro Itagaki

Date:

31 January 2010, 23:28:36

Robert Haas <robertmhaas@gmail.com> wrote:

> > ok - I am only one who like original behave - so I ?am withdrawing.
> > Robert, If you like, please commit Itagaki's patch. + add note about
> > behave when second parameter isn't stable.
> 
> I haven't even looked at this code - I sort of assumed Itagaki was
> handling this one.  But it might be good to make sure that the docs
> have been read through by a native English speaker prior to commit...

Applied with some editorialization. Please check the docs are good enough.

I removed a test pattern for variable delimiters from regression test
because it might be an undefined behavior. We might change the behavior
in the future, so we guarantee only constant delimiters.

The transition functions are renamed to "string_agg_transfn" and
"string_agg_delim_transfn". We cannot use "string_agg_transfn" for
both names because we cast the function name as regproc in bootstrap;
it complains about duplicated function names.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Re: Review: listagg aggregate

From

Pavel Stehule

Date:

01 February 2010, 01:08:23

2010/2/1 Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp>:
>
> Robert Haas <robertmhaas@gmail.com> wrote:
>
>> > ok - I am only one who like original behave - so I ?am withdrawing.
>> > Robert, If you like, please commit Itagaki's patch. + add note about
>> > behave when second parameter isn't stable.
>>
>> I haven't even looked at this code - I sort of assumed Itagaki was
>> handling this one.  But it might be good to make sure that the docs
>> have been read through by a native English speaker prior to commit...
>
> Applied with some editorialization. Please check the docs are good enough.
>
> I removed a test pattern for variable delimiters from regression test
> because it might be an undefined behavior. We might change the behavior
> in the future, so we guarantee only constant delimiters.
>
> The transition functions are renamed to "string_agg_transfn" and
> "string_agg_delim_transfn". We cannot use "string_agg_transfn" for
> both names because we cast the function name as regproc in bootstrap;
> it complains about duplicated function names.

thank you very much

Pavel Stehule

>
> Regards,
> ---
> Takahiro Itagaki
> NTT Open Source Software Center
>
>
>

Re: Review: listagg aggregate

From

Thom Brown

Date:

01 February 2010, 08:12:01

I noticed that the regression test results don't include the following case:<br /><br />select string_agg(a) from
(values(null),(null))g(a);<br /><br /> There is one similar where a delimiter is provided.<br /><br /> Which leads me
toask for clarification, would it be safe to assume that string_agg can never itself return null?<br /><br /> Thom<br
/><br/>

Re: Review: listagg aggregate

From

Pavel Stehule

Date:

01 February 2010, 09:41:04

2010/2/1 Thom Brown <thombrown@gmail.com>:
> I noticed that the regression test results don't include the following case:
>
> select string_agg(a) from (values(null),(null)) g(a);
>
> There is one similar where a delimiter is provided.
>
> Which leads me to ask for clarification, would it be safe to assume that
> string_agg can never itself return null?

if all values are null, then result is null.

Pavel

>
> Thom
>
>

Re: Review: listagg aggregate

From

Thom Brown

Date:

01 February 2010, 09:52:16

On 1 February 2010 13:40, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2010/2/1 Thom Brown <thombrown@gmail.com>:
> I noticed that the regression test results don't include the following case:
>
> select string_agg(a) from (values(null),(null)) g(a);
>
> There is one similar where a delimiter is provided.
>
> Which leads me to ask for clarification, would it be safe to assume that
> string_agg can never itself return null?

if all values are null, then result is null.

Pavel

Ah, I was looking at the expected results, and couldn't see a NULL outcome, but then these aren't indicated in such results anyway.

Thom

Re: Review: listagg aggregate

From

Robert Haas

Date:

01 February 2010, 11:40:51

On Sun, Jan 31, 2010 at 10:29 PM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:
> Applied with some editorialization. Please check the docs are good enough.

Looks pretty good.  I have committed a couple very minor adjustments.

...Robert