Thread: List Concatination

List Concatination

From
Josh Berkus
Date:
Folks,
I have an interesting problem.  For purpose of presentation to users,
I'd like to concatinate a list of VARCHAR values from a subtable.  To
simplify my actual situation:

CREATE TABLE clients (client_id SERIAL PRIMARY KEY,client_name VARCHAR(50) );

CREATE TABLE client_contacts (contact_id SERIAL PRIMARY KEY,client_id REFERENCES clients(client_id),first_name
VARCHAR(25),last_nameVARCHAR(25) );
 

CLients:
1     McMurphy Assoc.
2    Ross Construction

Contacts
1    1    Jim    Henson
2    1    Pat    O'Gannet
3    2    Sara    Vaugn
3    2    Bill    Murray

etc.

What I'd like to be able to do is present a list of clients and their
comma-seperated contacts in paragraph form, hence:


Client            Contacts
McMurphy Assoc.        Jim Henson, Pat O'Gannet, Kelley Moore

Ross Construction    Sara Vaugn, Bill Murray, Peter Frump,        Siskel Ebert

I can figure out how to do this procedurally (using PL/pgSQL and a
cursor) but not how to do it declaratively (using only SQL).  The reason
it's important to do it declaritively is that there are actually two
sub-tables with thousands of entries and the procedural approach is
rather slow.

Suggestions?
                    -Josh Berkus

-- 
______AGLIO DATABASE SOLUTIONS___________________________                                       Josh Berkus  Complete
informationtechnology      josh@agliodbs.com   and data management solutions       (415) 565-7293  for law firms, small
businesses      fax  621-2533   and non-profit organizations.       San Francisco
 


Re: List Concatination

From
Richard Huxton
Date:
Josh Berkus wrote:
>         I have an interesting problem.  For purpose of presentation to users,
> I'd like to concatinate a list of VARCHAR values from a subtable.  To
> simplify my actual situation:
> 
> What I'd like to be able to do is present a list of clients and their
> comma-seperated contacts in paragraph form, hence:
> 
> Client                  Contacts
> McMurphy Assoc.         Jim Henson, Pat O'Gannet, Kelley Moore
> 
> Ross Construction       Sara Vaugn, Bill Murray, Peter Frump,
>                         Siskel Ebert
> 
Well, basically you can use a standard join, order it and eliminate
duplicate client names in the application. That's the "proper" way.

But - if you don't care about the order of contacts you can define an
aggregate function:

create aggregate catenate(sfunc1=textcat, basetype=text, stype1=text, initcond1='');

Then group by client and catenate(firstname || ' ' || lastname)

You'll want to read the CREATE AGGREGATE page in the reference manual,
replace textcat with your own routine that adds a comma and you'll need
a finalisation routine to strip the final trailing comma.

Note that this is probably not a good idea - the ordering of the
contacts will not be well-defined. When I asked about this Tom Lane was
quite surprised that it worked, so no guarantees about long-term suitability.

- Richard Huxton


Re: List Concatination

From
Tom Lane
Date:
Richard Huxton <dev@archonet.com> writes:
> But - if you don't care about the order of contacts you can define an
> aggregate function:

> create aggregate catenate(sfunc1=textcat, basetype=text, stype1=text, initcond1='');

> Then group by client and catenate(firstname || ' ' || lastname)

With a custom aggregate you could make the aggregate function
responsible for handling the ordering of contacts:
select client, contactlist(contact) from table group by client;

If I were doing this, I'd make the aggregate state variable be "array of
text", and have the transition function simply append each new value to
the array.  (Or, if you're willing to assume that no newlines appear in
the contact names, the state variable can be plain text and can list the
contacts one per line.)  Then the finalization function would sort the
array elements and concatenate them with inserted commas.  These two
functions would be pretty trivial to write in pltcl or plperl, either
of which are the tool of first choice for string-bashing problems.

This wouldn't scale very well to huge numbers of contacts per client,
but for the numbers that would be reasonable to print out as single
lines of a report it should work fine.

> Note that this is probably not a good idea - the ordering of the
> contacts will not be well-defined. When I asked about this Tom Lane was
> quite surprised that it worked, so no guarantees about long-term suitability.

I don't recall the prior conversation, but certainly user-defined
aggregates are not going away...
        regards, tom lane


Re: List Concatination

From
"Josh Berkus"
Date:
Tom, Richard,
Thanks for the advice, guys!  This being Postgres, I *knew* there would
be other options.

> > create aggregate catenate(sfunc1=textcat, basetype=text,
> stype1=text, initcond1='');
> 
> > Then group by client and catenate(firstname || ' ' || lastname)
> 
> With a custom aggregate you could make the aggregate function
> responsible for handling the ordering of contacts:
> 
>  select client, contactlist(contact) from table group by client;
> 
> If I were doing this, I'd make the aggregate state variable be "array
> of
> text", and have the transition function simply append each new value
> to
> the array.  (Or, if you're willing to assume that no newlines appear
> in
> the contact names, the state variable can be plain text and can list
> the
> contacts one per line.)  Then the finalization function would sort
> the
> array elements and concatenate them with inserted commas.  These two
> functions would be pretty trivial to write in pltcl or plperl, either
> of which are the tool of first choice for string-bashing problems.
Hmmm... neither of these options sounds like it would be faster and
more scalable than a simple PL/pgSQL function which loops throught the
names and appends them to a string.  Perhaps for Phase II of our project
I'll be able to afford somebody to write a custom aggregate in C.
                -Josh Berkus

______AGLIO DATABASE SOLUTIONS___________________________                                      Josh Berkus Complete
informationtechnology      josh@agliodbs.com  and data management solutions       (415) 565-7293 for law firms, small
businesses       fax 621-2533   and non-profit organizations.      San Francisco
 


Re: List Concatination

From
Josh Berkus
Date:
Richard,
I wanted to thank you for the concatination suggestion ... on testing,
a custom aggregate *was* faster than procedural concatination ... much
faster.

> But - if you don't care about the order of contacts you can define an
> aggregate function:
> 
> create aggregate catenate(sfunc1=textcat, basetype=text, stype1=text, initcond1='');
> 
> Then group by client and catenate(firstname || ' ' || lastname)
> 
> You'll want to read the CREATE AGGREGATE page in the reference manual,
> replace textcat with your own routine that adds a comma and you'll need
> a finalisation routine to strip the final trailing comma.

Actually, if you use a sub-select as your data source, you can control
both the appearance and the order of the catenated values:

SELECT client, catenate(con_name)
FROM (SELECT client, (firstname || ' ' || lastname || ', 'FROM contacts ORDER BY lastname ASC) AS con_list
GROUP BY client;

This seems to work pretty well.

> Note that this is probably not a good idea - the ordering of the
> contacts will not be well-defined. When I asked about this Tom Lane was
> quite surprised that it worked, so no guarantees about long-term suitability.

Hmmm ... this feature is very, very, useful now that I know how to use
it.  I'd love to see it hang around for future versions of PgSQL.  Tom?

-Josh Berkus

-- 
______AGLIO DATABASE SOLUTIONS___________________________                                       Josh Berkus  Complete
informationtechnology      josh@agliodbs.com   and data management solutions       (415) 565-7293  for law firms, small
businesses      fax  621-2533   and non-profit organizations.       San Francisco
 


Re: List Concatination

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
>> Note that this is probably not a good idea - the ordering of the
>> contacts will not be well-defined. When I asked about this Tom Lane was
>> quite surprised that it worked, so no guarantees about long-term
>> suitability.

> Hmmm ... this feature is very, very, useful now that I know how to use
> it.  I'd love to see it hang around for future versions of PgSQL.  Tom?

As I said before, user-defined aggregates are certainly not going away.
I don't recall the conversation Richard was thinking of, so I'm not sure
exactly what was at issue there.  Most likely it was some fine point,
not the basic existence of the feature.
        regards, tom lane


Re: List Concatination

From
Richard H
Date:
On 3/15/01, 5:02:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote regarding Re: 
[SQL] List Concatination :

> Josh Berkus <josh@agliodbs.com> writes:
> >> Note that this is probably not a good idea - the ordering of the
> >> contacts will not be well-defined. When I asked about this Tom Lane was
> >> quite surprised that it worked, so no guarantees about long-term
> >> suitability.

Sorry - issue was to do with the ordering of the concatenation, not the 
user-defined aggregates (iirc - it's getting late here).

I do remember I got different orders when selecting and updating. In my 
case it didn't matter, and I'm guessing if the order reverses in your 
case when 8.x is released it's not the end of the world either. If you 
were joining words in a sentence, obviously it would matter (unless you 
were on usenet ;-)

> > Hmmm ... this feature is very, very, useful now that I know how to use
> > it.  I'd love to see it hang around for future versions of PgSQL.  Tom?

> As I said before, user-defined aggregates are certainly not going away.
> I don't recall the conversation Richard was thinking of, so I'm not sure
> exactly what was at issue there.  Most likely it was some fine point,
> not the basic existence of the feature.

>                       regards, tom lane

No not at all - and sorry for any confusion.

- Richard Huxton


Re: List Concatination

From
"Josh Berkus"
Date:
Richard,

> Sorry - issue was to do with the ordering of the concatenation, not
> the 
> user-defined aggregates (iirc - it's getting late here).
> 
> I do remember I got different orders when selecting and updating. In
> my 
> case it didn't matter, and I'm guessing if the order reverses in your
> case when 8.x is released it's not the end of the world either.

As I said in my previous e-mail, I appear to have gotten the list to
order itself by basing it on an (ordered) sub-select.  Since the DB is
only 50% populated right now, I'm not sure that's working perfectly but
I'll keep you posted.

> If
> you 
> were joining words in a sentence, obviously it would matter (unless
> you 
> were on usenet ;-)

illiterate posters newsgroup most calling you are?

-Josh Berkus


______AGLIO DATABASE SOLUTIONS___________________________                                      Josh Berkus Complete
informationtechnology      josh@agliodbs.com  and data management solutions       (415) 565-7293 for law firms, small
businesses       fax 621-2533   and non-profit organizations.      San Francisco
 


Re: List Concatination

From
"Josh Berkus"
Date:
Tom,

> Yes, that oughta work fine to determine the order of inputs to the
> aggregate function.  ORDER BY in sub-selects is a new feature (heck,
> sub-selects in FROM at all is a new feature) in 7.1, so this trick
> wasn't available when Richard and I discussed the issue before.

Hey, why do you think that I was bugging you about 7.1 for months?  You
should see some of the things I do with sub-selects.   Err ...
programming-wise, that is ;-)

Speaking of which, when's the 7.1 release?

-Josh

______AGLIO DATABASE SOLUTIONS___________________________                                      Josh Berkus Complete
informationtechnology      josh@agliodbs.com  and data management solutions       (415) 565-7293 for law firms, small
businesses       fax 621-2533   and non-profit organizations.      San Francisco
 


Re: List Concatination

From
Tom Lane
Date:
"Josh Berkus" <josh@agliodbs.com> writes:
> As I said in my previous e-mail, I appear to have gotten the list to
> order itself by basing it on an (ordered) sub-select.  Since the DB is
> only 50% populated right now, I'm not sure that's working perfectly but
> I'll keep you posted.

Yes, that oughta work fine to determine the order of inputs to the
aggregate function.  ORDER BY in sub-selects is a new feature (heck,
sub-selects in FROM at all is a new feature) in 7.1, so this trick
wasn't available when Richard and I discussed the issue before.
        regards, tom lane