Thread: interesting SQL puzzle - concatenating column with itself.
Hi- I'm doing a conversion from an older database in which a memo field was handled by storing one line per record and thendisplaying the related records in order. I want to compress all of the lines into a single text field with one recordper memo entry. So for instance, the old database looks like this: memo_id | sequence | memo_text --------------------------------------- 666 | 1 | The quick 666 | 2 | red fox 666 | 3 | jumped over 666 | 4 | the lazy brown dog And my goal is to transform each group of lines into a single record that looks like this: memo_id | memo_text ---------------------------------------------------------- 666 | The quick red fox jumped over the lazy brown dog Any thoughts on how to do this via sql? I could write a little program to do it, but it seems like there must be a pure-SQLsolution that I'm not seeing. Thanks -Nick -- ------------------------------------------------------------------ Nick Fankhauser nickf@ontko.com Phone 765.935.4283 Fax 765.962.9788 Ray Ontko & Co. - Software Consulting Services http://www.ontko.com
On Sun, May 08, 2005 at 01:55:56PM -0500, Nick Fankhauser wrote: > > Hi- I'm doing a conversion from an older database in which a memo field was > handled by storing one line per record and then displaying the related > records in order. I want to compress all of the lines into a single text > field with one record per memo entry. > > > So for instance, the old database looks like this: > > > memo_id | sequence | memo_text > --------------------------------------- > 666 | 1 | The quick > 666 | 2 | red fox > 666 | 3 | jumped over > 666 | 4 | the lazy brown dog > > > And my goal is to transform each group of lines into a single record that > looks like this: > > > memo_id | memo_text > ---------------------------------------------------------- > 666 | The quick red fox jumped over the lazy brown dog > > > Any thoughts on how to do this via sql? I could write a little program to > do it, but it seems like there must be a pure-SQL solution that I'm not > seeing. You can create a custom aggregate to do concatenation and group by memo_id. create or replace function text_cat(text, text) returns text called on null input language sql immutable as 'select case when $1 is null then $2 when $2 is null then $1 else $1 || '' '' || $2 end'; create aggregate textcat (basetype = text, sfunc = text_cat, stype = text); create table memos (memo_id int, sequence int, memo_text text); insert into memos values (666, 3, 'jumped over'); insert into memos values (666, 1, 'The quick'); insert into memos values (666, 4, 'the lazy brown dog'); insert into memos values (666, 2, 'red fox'); select memo_id, textcat(memo_text) from (select * from memos order by memo_id, sequence) as foo group by memo_id; The order is not really guaranteed, though if this is a one-shot thing, you may get away with turning off hashed aggregates. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "There was no reply" (Kernel Traffic)
Excellent! Thanks for providing both the idea and an example. I didn't get the idea right away, but the example made it clear.I'll try that on my table and report back on how it works out. Regards, -Nick ------------------------------------------------------------------ Nick Fankhauser nickf@ontko.com Phone 765.935.4283 Fax 765.962.9788 Ray Ontko & Co. - Software Consulting Services http://www.ontko.com Alvaro Herrera wrote: > > You can create a custom aggregate to do concatenation and group by > memo_id. > > create or replace function text_cat(text, text) returns text called on null input > language sql immutable as 'select case when $1 is null then $2 when $2 > is null then $1 else $1 || '' '' || $2 end'; > > create aggregate textcat (basetype = text, sfunc = text_cat, stype = text); > > create table memos (memo_id int, sequence int, memo_text text); > insert into memos values (666, 3, 'jumped over'); > insert into memos values (666, 1, 'The quick'); > insert into memos values (666, 4, 'the lazy brown dog'); > insert into memos values (666, 2, 'red fox'); > > select memo_id, textcat(memo_text) from (select * from memos order by > memo_id, sequence) as foo group by memo_id; > > > The order is not really guaranteed, though if this is a one-shot thing, > you may get away with turning off hashed aggregates. >
Alvaro Herrera wrote: > The order is not really guaranteed, though if this is a one-shot thing, > you may get away with turning off hashed aggregates. > When I read this, I assumed there was a runtime parameter I could set that was similar to ENABLE_HASHJOIN. Are you referringto a different runtime parameter or something else entirely? BTW the concatenation function you suggest works nicely except that as you noted, it concatenates in an unpredictable order,so I'm now trying to solve that problem. -Nick ------------------------------------------------------------------ Nick Fankhauser nickf@ontko.com Phone 765.935.4283 Fax 765.962.9788 Ray Ontko & Co. - Software Consulting Services http://www.ontko.com
> BTW the concatenation function you suggest works nicely except that as > you noted, it concatenates in an unpredictable order, so I'm now trying > to solve that problem. memo_id | sequence | memo_text --------------------------------------- 666 | 1 | The quick 666 | 2 | red fox 666 | 3 | jumped over 666 | 4 | the lazy brown dog You have : SELECT your_concat( memo_text ) FROM table GROUP BY memo_id You can use : SELECT your_concat( memo_text ) FROM (SELECT memo_id, sequence, memo_text FROM table ORDER BY memo_id, sequence OFFSET 0) AS foo GROUP BY memo_id the OFFSET 0 may be necessary (or not). Try it !
Nick Fankhauser <nickf@ontko.com> writes: > Alvaro Herrera wrote: > > > The order is not really guaranteed, though if this is a one-shot thing, > > you may get away with turning off hashed aggregates. > > > > When I read this, I assumed there was a runtime parameter I could set that was > similar to ENABLE_HASHJOIN. Are you referring to a different runtime parameter > or something else entirely? Similar but different. enable_hashagg. You can see all of these with "show all" in psql. However you do not have to worry. Even with (actually, *especially with*) hash aggregates the records will be processed in the order they're received. It's actually the normal aggregate method of sorting on the GROUP BY columns that risks damaging the order. However Postgres does a special check to see if the subquery is already sorted by the GROUP BY column and avoids the extra sort which could cause you problems. So this is not guaranteed by the SQL spec to work (but then the SQL spec doesn't have custom aggregates at all) but Postgres goes out of its way to make sure this doesn't break unnecessarily. [This is all for 8.0 and I think 7.4. Some versions before that might -- greg
> SELECT your_concat( memo_text ) FROM > (SELECT memo_id, sequence, memo_text FROM table ORDER BY memo_id, sequence > OFFSET 0) AS foo > GROUP BY memo_id I'm just curious - what's the 'OFFSET 0' for? Dmitri
> I'm just curious - what's the 'OFFSET 0' for? Trick to fool postgres into thinking it can't rewrite out your subquery and eliminate it ...