Thread: List Concatination
Folks, I have an interesting problem. For purpose of presentation to users, I'd like to concatinate a list of VARCHAR values from a subtable. To simplify my actual situation: CREATE TABLE clients (client_id SERIAL PRIMARY KEY,client_name VARCHAR(50) ); CREATE TABLE client_contacts (contact_id SERIAL PRIMARY KEY,client_id REFERENCES clients(client_id),first_name VARCHAR(25),last_nameVARCHAR(25) ); CLients: 1 McMurphy Assoc. 2 Ross Construction Contacts 1 1 Jim Henson 2 1 Pat O'Gannet 3 2 Sara Vaugn 3 2 Bill Murray etc. What I'd like to be able to do is present a list of clients and their comma-seperated contacts in paragraph form, hence: Client Contacts McMurphy Assoc. Jim Henson, Pat O'Gannet, Kelley Moore Ross Construction Sara Vaugn, Bill Murray, Peter Frump, Siskel Ebert I can figure out how to do this procedurally (using PL/pgSQL and a cursor) but not how to do it declaratively (using only SQL). The reason it's important to do it declaritively is that there are actually two sub-tables with thousands of entries and the procedural approach is rather slow. Suggestions? -Josh Berkus -- ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete informationtechnology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
Josh Berkus wrote: > I have an interesting problem. For purpose of presentation to users, > I'd like to concatinate a list of VARCHAR values from a subtable. To > simplify my actual situation: > > What I'd like to be able to do is present a list of clients and their > comma-seperated contacts in paragraph form, hence: > > Client Contacts > McMurphy Assoc. Jim Henson, Pat O'Gannet, Kelley Moore > > Ross Construction Sara Vaugn, Bill Murray, Peter Frump, > Siskel Ebert > Well, basically you can use a standard join, order it and eliminate duplicate client names in the application. That's the "proper" way. But - if you don't care about the order of contacts you can define an aggregate function: create aggregate catenate(sfunc1=textcat, basetype=text, stype1=text, initcond1=''); Then group by client and catenate(firstname || ' ' || lastname) You'll want to read the CREATE AGGREGATE page in the reference manual, replace textcat with your own routine that adds a comma and you'll need a finalisation routine to strip the final trailing comma. Note that this is probably not a good idea - the ordering of the contacts will not be well-defined. When I asked about this Tom Lane was quite surprised that it worked, so no guarantees about long-term suitability. - Richard Huxton
Richard Huxton <dev@archonet.com> writes: > But - if you don't care about the order of contacts you can define an > aggregate function: > create aggregate catenate(sfunc1=textcat, basetype=text, stype1=text, initcond1=''); > Then group by client and catenate(firstname || ' ' || lastname) With a custom aggregate you could make the aggregate function responsible for handling the ordering of contacts: select client, contactlist(contact) from table group by client; If I were doing this, I'd make the aggregate state variable be "array of text", and have the transition function simply append each new value to the array. (Or, if you're willing to assume that no newlines appear in the contact names, the state variable can be plain text and can list the contacts one per line.) Then the finalization function would sort the array elements and concatenate them with inserted commas. These two functions would be pretty trivial to write in pltcl or plperl, either of which are the tool of first choice for string-bashing problems. This wouldn't scale very well to huge numbers of contacts per client, but for the numbers that would be reasonable to print out as single lines of a report it should work fine. > Note that this is probably not a good idea - the ordering of the > contacts will not be well-defined. When I asked about this Tom Lane was > quite surprised that it worked, so no guarantees about long-term suitability. I don't recall the prior conversation, but certainly user-defined aggregates are not going away... regards, tom lane
Tom, Richard, Thanks for the advice, guys! This being Postgres, I *knew* there would be other options. > > create aggregate catenate(sfunc1=textcat, basetype=text, > stype1=text, initcond1=''); > > > Then group by client and catenate(firstname || ' ' || lastname) > > With a custom aggregate you could make the aggregate function > responsible for handling the ordering of contacts: > > select client, contactlist(contact) from table group by client; > > If I were doing this, I'd make the aggregate state variable be "array > of > text", and have the transition function simply append each new value > to > the array. (Or, if you're willing to assume that no newlines appear > in > the contact names, the state variable can be plain text and can list > the > contacts one per line.) Then the finalization function would sort > the > array elements and concatenate them with inserted commas. These two > functions would be pretty trivial to write in pltcl or plperl, either > of which are the tool of first choice for string-bashing problems. Hmmm... neither of these options sounds like it would be faster and more scalable than a simple PL/pgSQL function which loops throught the names and appends them to a string. Perhaps for Phase II of our project I'll be able to afford somebody to write a custom aggregate in C. -Josh Berkus ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete informationtechnology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
Richard, I wanted to thank you for the concatination suggestion ... on testing, a custom aggregate *was* faster than procedural concatination ... much faster. > But - if you don't care about the order of contacts you can define an > aggregate function: > > create aggregate catenate(sfunc1=textcat, basetype=text, stype1=text, initcond1=''); > > Then group by client and catenate(firstname || ' ' || lastname) > > You'll want to read the CREATE AGGREGATE page in the reference manual, > replace textcat with your own routine that adds a comma and you'll need > a finalisation routine to strip the final trailing comma. Actually, if you use a sub-select as your data source, you can control both the appearance and the order of the catenated values: SELECT client, catenate(con_name) FROM (SELECT client, (firstname || ' ' || lastname || ', 'FROM contacts ORDER BY lastname ASC) AS con_list GROUP BY client; This seems to work pretty well. > Note that this is probably not a good idea - the ordering of the > contacts will not be well-defined. When I asked about this Tom Lane was > quite surprised that it worked, so no guarantees about long-term suitability. Hmmm ... this feature is very, very, useful now that I know how to use it. I'd love to see it hang around for future versions of PgSQL. Tom? -Josh Berkus -- ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete informationtechnology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
Josh Berkus <josh@agliodbs.com> writes: >> Note that this is probably not a good idea - the ordering of the >> contacts will not be well-defined. When I asked about this Tom Lane was >> quite surprised that it worked, so no guarantees about long-term >> suitability. > Hmmm ... this feature is very, very, useful now that I know how to use > it. I'd love to see it hang around for future versions of PgSQL. Tom? As I said before, user-defined aggregates are certainly not going away. I don't recall the conversation Richard was thinking of, so I'm not sure exactly what was at issue there. Most likely it was some fine point, not the basic existence of the feature. regards, tom lane
On 3/15/01, 5:02:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote regarding Re: [SQL] List Concatination : > Josh Berkus <josh@agliodbs.com> writes: > >> Note that this is probably not a good idea - the ordering of the > >> contacts will not be well-defined. When I asked about this Tom Lane was > >> quite surprised that it worked, so no guarantees about long-term > >> suitability. Sorry - issue was to do with the ordering of the concatenation, not the user-defined aggregates (iirc - it's getting late here). I do remember I got different orders when selecting and updating. In my case it didn't matter, and I'm guessing if the order reverses in your case when 8.x is released it's not the end of the world either. If you were joining words in a sentence, obviously it would matter (unless you were on usenet ;-) > > Hmmm ... this feature is very, very, useful now that I know how to use > > it. I'd love to see it hang around for future versions of PgSQL. Tom? > As I said before, user-defined aggregates are certainly not going away. > I don't recall the conversation Richard was thinking of, so I'm not sure > exactly what was at issue there. Most likely it was some fine point, > not the basic existence of the feature. > regards, tom lane No not at all - and sorry for any confusion. - Richard Huxton
Richard, > Sorry - issue was to do with the ordering of the concatenation, not > the > user-defined aggregates (iirc - it's getting late here). > > I do remember I got different orders when selecting and updating. In > my > case it didn't matter, and I'm guessing if the order reverses in your > case when 8.x is released it's not the end of the world either. As I said in my previous e-mail, I appear to have gotten the list to order itself by basing it on an (ordered) sub-select. Since the DB is only 50% populated right now, I'm not sure that's working perfectly but I'll keep you posted. > If > you > were joining words in a sentence, obviously it would matter (unless > you > were on usenet ;-) illiterate posters newsgroup most calling you are? -Josh Berkus ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete informationtechnology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
Tom, > Yes, that oughta work fine to determine the order of inputs to the > aggregate function. ORDER BY in sub-selects is a new feature (heck, > sub-selects in FROM at all is a new feature) in 7.1, so this trick > wasn't available when Richard and I discussed the issue before. Hey, why do you think that I was bugging you about 7.1 for months? You should see some of the things I do with sub-selects. Err ... programming-wise, that is ;-) Speaking of which, when's the 7.1 release? -Josh ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete informationtechnology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
"Josh Berkus" <josh@agliodbs.com> writes: > As I said in my previous e-mail, I appear to have gotten the list to > order itself by basing it on an (ordered) sub-select. Since the DB is > only 50% populated right now, I'm not sure that's working perfectly but > I'll keep you posted. Yes, that oughta work fine to determine the order of inputs to the aggregate function. ORDER BY in sub-selects is a new feature (heck, sub-selects in FROM at all is a new feature) in 7.1, so this trick wasn't available when Richard and I discussed the issue before. regards, tom lane