Thread: Re: Denormalizing during select
On Mon, 24 Feb 2003 12:18:24 -0500, no.spam wrote: > I have two tables (A and B) in a one-to-many relationship. When > retrieving data from one table (A), I need to denormalize the tables > during a select so that values of a column in the many table (B) appear > as a list in one field of the output. Well, the straightforward way to do this would be with a function: create or replace function my_func(text) returns text as ' declare name_one alias for $1; accum text; result record; begin accum := \'\'; for result in select property from many_table where name = name_one loop if accum != \'\' then accum := accum || \',\'; end if; accum := accum || result.property; end loop; returnaccum; end; ' language 'plpgsql'; Note that the quotes internal to the function must be escaped. -- Jeff Boes vox 269.226.9550 ext 24 Database Engineer fax 269.349.9076 Nexcerpt, Inc. http://www.nexcerpt.com ...Nexcerpt... Extend your Expertise
Jeff, > > I have two tables (A and B) in a one-to-many relationship. When > > retrieving data from one table (A), I need to denormalize the tables > > during a select so that values of a column in the many table (B) appear > > as a list in one field of the output. > > Well, the straightforward way to do this would be with a function: Can you post the start of this thread? I think there's a more efficient way using custom aggregates, but somhow I never received the original question. -- -Josh BerkusAglio Database SolutionsSan Francisco
Jeff, > > I have two tables (A and B) in a one-to-many relationship. When > > retrieving data from one table (A), I need to denormalize the tables > > during a select so that values of a column in the many table (B) appear > > as a list in one field of the output. > > Well, the straightforward way to do this would be with a function: Actually, it's much easier to do this using a custom aggregate: CREATE FUNCTION "comma_cat" (text,text) RETURNS text AS ' SELECT CASE WHEN $2 IS NULL OR $2 = '''' THEN $1WHEN $1 IS NULL OR $1 = '''' THEN $2ELSE $1 || '', '' || $2END ' LANGUAGE 'sql'; CREATE AGGREGATE comma_list ( BASETYPE = text, SFUNC = comma_cat, STYPE = text, INITCOND = '' ); SELECT a.id, a.col1, a.col2, comma_list(b.text) FROM a JOIN b on a.id = b.a_id GROUP BY a.id, a.col1, a.col2 The only drawback of this approach is that you cannot order the items in the list, but it is *much* faster than the function method that Jeff outlined. -- -Josh BerkusAglio Database SolutionsSan Francisco
On Tue, 25 Feb 2003 03:44:02 GMT, Edmund Lian wrote: >I found this example in "Practical PostgreSQL"... will it do the job? Answering my own question: kind of. The problem with custom aggregates is that they need to be used with a "group by" clause, and this means that the select cannot return columns that are not aggregates of some kind. What I'm trying to return are rows that are a combination of columns and aggregates. ...Edmund.
On Tue, 25 Feb 2003 03:44:02 GMT, Edmund Lian wrote: >I found this example in "Practical PostgreSQL"... will it do the job? Answering my own question: kind of. The problem with custom aggregates is that they need to be used with a "group by" clause, and this means that the select cannot return columns that are not aggregates of some kind. What I'm trying to return are rows that are a combination of columns and aggregates. ...Edmund.
On Tue, 2003-02-25 at 13:48, Edmund Lian wrote: > On Tue, 25 Feb 2003 03:44:02 GMT, Edmund Lian wrote: > > >I found this example in "Practical PostgreSQL"... will it do the job? > > Answering my own question: kind of. The problem with custom aggregates > is that they need to be used with a "group by" clause, and this means > that the select cannot return columns that are not aggregates of some > kind. What I'm trying to return are rows that are a combination of > columns and aggregates. I've been trying to figure out how to give a running total (similar issue I think). key value 1 5 1 5 1 5 2 1 2 2 2 1 Query output: key value sum to point 1 5 5 1 5 10 1 5 15 2 1 1 2 2 3 2 1 4 I think I should be able to do it with an aggregate -- but the best I've been able to come up with is a Set Returning Function. -- Rod Taylor <rbt@rbt.ca> PGP Key: http://www.rbt.ca/rbtpub.asc
On Tue, 2003-02-25 at 13:48, Edmund Lian wrote: > On Tue, 25 Feb 2003 03:44:02 GMT, Edmund Lian wrote: > > >I found this example in "Practical PostgreSQL"... will it do the job? > > Answering my own question: kind of. The problem with custom aggregates > is that they need to be used with a "group by" clause, and this means > that the select cannot return columns that are not aggregates of some > kind. What I'm trying to return are rows that are a combination of > columns and aggregates. I've been trying to figure out how to give a running total (similar issue I think). key value 1 5 1 5 1 5 2 1 2 2 2 1 Query output: key value sum to point 1 5 5 1 5 10 1 5 15 2 1 1 2 2 3 2 1 4 I think I should be able to do it with an aggregate -- but the best I've been able to come up with is a Set Returning Function. -- Rod Taylor <rbt@rbt.ca> PGP Key: http://www.rbt.ca/rbtpub.asc
Edmund Lian <no.spam@address.com> writes: > Answering my own question: kind of. The problem with custom aggregates > is that they need to be used with a "group by" clause, and this means > that the select cannot return columns that are not aggregates of some > kind. What I'm trying to return are rows that are a combination of > columns and aggregates. This strikes me as evidence of fuzzy thinking. What sets of rows are the aggregates taken over? Which column values within those sets of rows do you expect the non-aggregated column references to return? If the columns aren't the ones grouped by, seems like you have an inherently undefined result. If you know for some reason that there will be only one unique value of a column in a grouped row set, or you don't actually much care which one you get, then you could use MIN() or MAX() on the column reference to make it look like a kosher query. regards, tom lane
Rod Taylor <rbt@rbt.ca> writes: > I've been trying to figure out how to give a running total (similar > issue I think). Running totals are a "hard problem". They certainly cannot be solved using aggregates. They're similar to the ranking problem of assigning a sequential number to each item within each group. The problem is they share certain properties of aggregate functions, namely that they require persistent state storage and a state transition function. But they definitely aren't aggregate functions in that they return a value for every record, not one for the whole group. I'm not even clear how to write an embedded (plpgsql or perl or python) function, since I'm not clear how to allocate space for the state that will be available for each call on each record but independent from other calls in the same query. You have to be able to handle two running totals at the same time. Note that running totals are not very sql-ish. Since sql deals in unordered sets a running total is pretty ill-defined. It would have to be calculated after the sort operation or else require you to sort the input tables in a subquery or something. To write a proper well-defined sql-ish query for running totals you would have to do the very inefficient: select employee_id, salary, (select count(*) from employees x where x.salary < employee.salary) as salary_rank, (select sum(salary) from employees x where x.salary < employee.salary) as running_total from employeesorderby salary desc Finding a way to transform that into the single-scan plan that's obviously the right way to execute it would be really cool but seems pretty far-fetched. I don't think any database is capable of it today. -- greg
Jeff and Josh, I found this example in "Practical PostgreSQL"... will it do the job? """ The following example defines an aggregate function named sum(), for use with the text data type. This aggregate calls the textcat(text,text) function built into PostgreSQL to return a concatenated "sum" of all the text found in its input values: booktown=# CREATE AGGREGATE sum ( BASETYPE = text, booktown(# SFUNC = textcat, booktown(# STYPE = text, booktown(# INITCOND = '' ); CREATE booktown=# SELECT sum(title || ' ') FROM books WHERE title ~ '^L'; sum -------------------------------Little Women Learning Python (1 row) """ ...Edmund.
On Tue, 25 Feb 2003 01:22:42 +0000 (UTC), josh@agliodbs.com (Josh Berkus) wrote: >Can you post the start of this thread? I think there's a more >efficient way using custom aggregates, but somhow I never >received the original question. I'm hoping there is... somehow, using a function to loop over every row seems a bit inefficient? Here's my original question... I have two tables (A and B) in a one-to-many relationship. When retrieving data from one table (A), I need to denormalize the tables during a select so that values of a column in the many table (B) appear as a list in one field of the output. E.g.: create table one_table ( name varchar(20), age integer, -- constraint one_table_pk primary key (name) ); create table many_table ( name varchar(20), property varchar(15), -- constraint many_table_pk primary key(name, property), constraint many_table_fk foreign key (name) references one_table (name) ); So one_table could contain: name age ---- --- Mickey Mouse 50 Donald Duck 60 Goofy 45 and many_table could contain: name property ---- -------- Mickey Mouse Small Mickey Mouse Black Mickey Mouse Squeaks Donald Duck Quacks The query I'm trying to write would have as output something like: name age properties ---- --- ---------- Mickey Mouse 50 Small, Black, Squeaks DonaldDuck 60 Quacks Goofy 45 <null> Could somebody please help point me in the right direction? I imagine this is a common need... Thanks, ...Edmund.
The email at the bottom gave me an idea, but it doesn't quite work: CREATE AGGREGATE accumulate( BASETYPE = text, SFUNC = textcat, STYPE = text, INITCOND = '' ); -- SELECT ('{' || ltrim(accumulate(',' || tablename),',') || '}') as cruft FROM pg_tables WHERE hasindexes = 'f'; cruft --------------------------- {pg_xactlock,pg_listener} (1 row) This produces somthing that looks like it could be able to be converted into an array but I cant figure out how to make it work. Guy Edmund Lian wrote: > Jeff and Josh, > > I found this example in "Practical PostgreSQL"... will it do the job? > > """ > The following example defines an aggregate function named sum(), for > use with the text data type. This aggregate calls the > textcat(text,text) function built into PostgreSQL to return a > concatenated "sum" of all the text found in its input values: > > booktown=# CREATE AGGREGATE sum ( BASETYPE = text, > booktown(# SFUNC = textcat, > booktown(# STYPE = text, > booktown(# INITCOND = '' ); > CREATE > booktown=# SELECT sum(title || ' ') FROM books WHERE title ~ '^L'; ...snip...
You might be able to adapt the code in contrib/intagg to handle text. I'm not sure what would be involved in doing so. -- greg
Guy Fraser wrote: > The email at the bottom gave me an idea, but it doesn't quite work: > > CREATE AGGREGATE accumulate( > BASETYPE = text, > SFUNC = textcat, > STYPE = text, > INITCOND = '' ); > -- > SELECT ('{' || ltrim(accumulate(',' || tablename),',') || '}') as cruft > FROM pg_tables > WHERE hasindexes = 'f'; > > cruft > --------------------------- > {pg_xactlock,pg_listener} > (1 row) > > This produces somthing that looks like it could be able to be converted > into an array but I cant figure out how to make it work. > If I understand correctly, the function array_accum() distributed with PL/R can do exactly what you're looking for (you need the function, but you don't necessarily need to use PL/R): CREATE OR REPLACE FUNCTION array_accum (_name, name) RETURNS name[] AS '$libdir/plr','array_accum' LANGUAGE 'C'; CREATE AGGREGATE accumulate ( sfunc = array_accum, basetype = name, stype = _name ); regression=# SELECT accumulate(tablename) as cruft FROM pg_tables WHERE tablename LIKE 'c%'; cruft --------------------------------------- {connectby_int,connectby_text,ct,cth} (1 row) See: http://www.joeconway.com/plr/doc/plr-aggregate-funcs.html and download at: http://www.joeconway.com/plr/ HTH, Joe
Thanks a lot Joe Conway wrote: > Guy Fraser wrote: > >> The email at the bottom gave me an idea, but it doesn't quite work: >> >> CREATE AGGREGATE accumulate( >> BASETYPE = text, >> SFUNC = textcat, >> STYPE = text, >> INITCOND = '' ); >> -- >> SELECT ('{' || ltrim(accumulate(',' || tablename),',') || '}') as cruft >> FROM pg_tables >> WHERE hasindexes = 'f'; >> >> cruft >> --------------------------- >> {pg_xactlock,pg_listener} >> (1 row) >> >> This produces somthing that looks like it could be able to be >> converted into an array but I cant figure out how to make it work. >> > > If I understand correctly, the function array_accum() distributed with > PL/R can do exactly what you're looking for (you need the function, > but you don't necessarily need to use PL/R): > > CREATE OR REPLACE FUNCTION array_accum (_name, name) > RETURNS name[] > AS '$libdir/plr','array_accum' > LANGUAGE 'C'; > > CREATE AGGREGATE accumulate ( > sfunc = array_accum, > basetype = name, > stype = _name > ); > > regression=# SELECT accumulate(tablename) as cruft FROM pg_tables > WHERE tablename LIKE 'c%'; > cruft > --------------------------------------- > {connectby_int,connectby_text,ct,cth} > (1 row) > > > See: > http://www.joeconway.com/plr/doc/plr-aggregate-funcs.html > and download at: > http://www.joeconway.com/plr/ > > HTH, > > Joe > >