Thread: Re: Denormalizing during select

Re: Denormalizing during select

From
Jeff Boes
Date:
On Mon, 24 Feb 2003 12:18:24 -0500, no.spam wrote:

> I have two tables (A and B) in a one-to-many relationship. When
> retrieving data from one table (A), I need to denormalize the tables
> during a select so that values of a column in the many table (B) appear
> as a list in one field of the output.

Well, the straightforward way to do this would be with a function:

create or replace function my_func(text) returns text as '
declare name_one alias for $1; accum text; result record;

begin accum := \'\'; for result in select property from many_table where name = name_one
loop   if accum != \'\'   then     accum := accum || \',\';   end if;   accum := accum || result.property; end loop;
returnaccum; 
end;
' language 'plpgsql';

Note that the quotes internal to the function must be escaped.

--
Jeff Boes                                      vox 269.226.9550 ext 24
Database Engineer                                     fax 269.349.9076
Nexcerpt, Inc.                                 http://www.nexcerpt.com          ...Nexcerpt... Extend your Expertise


Re: Denormalizing during select

From
Josh Berkus
Date:
Jeff,

> > I have two tables (A and B) in a one-to-many relationship. When
> > retrieving data from one table (A), I need to denormalize the tables
> > during a select so that values of a column in the many table (B) appear
> > as a list in one field of the output.
>
> Well, the straightforward way to do this would be with a function:

Can you post the start of this thread?  I think there's a more efficient way
using custom aggregates, but somhow I never received the original question.


--
-Josh BerkusAglio Database SolutionsSan Francisco



Re: Denormalizing during select

From
Josh Berkus
Date:
Jeff,

> > I have two tables (A and B) in a one-to-many relationship. When
> > retrieving data from one table (A), I need to denormalize the tables
> > during a select so that values of a column in the many table (B) appear
> > as a list in one field of the output.
>
> Well, the straightforward way to do this would be with a function:

Actually, it's much easier to do this using a custom aggregate:

CREATE FUNCTION "comma_cat" (text,text) RETURNS text AS '
SELECT CASE WHEN $2 IS NULL OR $2 = '''' THEN $1WHEN $1 IS NULL OR $1 = '''' THEN $2ELSE $1 || '', '' || $2END
' LANGUAGE 'sql';


CREATE AGGREGATE comma_list ( BASETYPE = text, SFUNC = comma_cat, STYPE =
text,
INITCOND = '' );

SELECT a.id, a.col1, a.col2, comma_list(b.text)
FROM a JOIN b on a.id = b.a_id
GROUP BY a.id, a.col1, a.col2

The only drawback of this approach is that you cannot order the items in the
list, but it is *much* faster than the function method that Jeff outlined.


--
-Josh BerkusAglio Database SolutionsSan Francisco



Re: Denormalizing during select

From
Edmund Lian
Date:
On Tue, 25 Feb 2003 03:44:02 GMT, Edmund Lian wrote:

>I found this example in "Practical PostgreSQL"... will it do the job?

Answering my own question: kind of. The problem with custom aggregates
is that they need to be used with a "group by" clause, and this means
that the select cannot return columns that are not aggregates of some
kind. What I'm trying to return are rows that are a combination of
columns and aggregates.

...Edmund.


Re: Denormalizing during select

From
Edmund Lian
Date:
On Tue, 25 Feb 2003 03:44:02 GMT, Edmund Lian wrote:

>I found this example in "Practical PostgreSQL"... will it do the job?

Answering my own question: kind of. The problem with custom aggregates
is that they need to be used with a "group by" clause, and this means
that the select cannot return columns that are not aggregates of some
kind. What I'm trying to return are rows that are a combination of
columns and aggregates.

...Edmund.


Re: Denormalizing during select

From
Rod Taylor
Date:
On Tue, 2003-02-25 at 13:48, Edmund Lian wrote:
> On Tue, 25 Feb 2003 03:44:02 GMT, Edmund Lian wrote:
>
> >I found this example in "Practical PostgreSQL"... will it do the job?
>
> Answering my own question: kind of. The problem with custom aggregates
> is that they need to be used with a "group by" clause, and this means
> that the select cannot return columns that are not aggregates of some
> kind. What I'm trying to return are rows that are a combination of
> columns and aggregates.

I've been trying to figure out how to give a running total (similar
issue I think).

key        value
1    5
1    5
1    5
2    1
2    2
2    1


Query output:
key    value    sum to point
1    5    5
1    5    10
1    5    15
2    1    1
2    2    3
2    1    4


I think I should be able to do it with an aggregate -- but the best I've
been able to come up with is a Set Returning Function.

--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

Re: Denormalizing during select

From
Rod Taylor
Date:
On Tue, 2003-02-25 at 13:48, Edmund Lian wrote:
> On Tue, 25 Feb 2003 03:44:02 GMT, Edmund Lian wrote:
>
> >I found this example in "Practical PostgreSQL"... will it do the job?
>
> Answering my own question: kind of. The problem with custom aggregates
> is that they need to be used with a "group by" clause, and this means
> that the select cannot return columns that are not aggregates of some
> kind. What I'm trying to return are rows that are a combination of
> columns and aggregates.

I've been trying to figure out how to give a running total (similar
issue I think).

key        value
1    5
1    5
1    5
2    1
2    2
2    1


Query output:
key    value    sum to point
1    5    5
1    5    10
1    5    15
2    1    1
2    2    3
2    1    4


I think I should be able to do it with an aggregate -- but the best I've
been able to come up with is a Set Returning Function.

--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

Re: Denormalizing during select

From
Tom Lane
Date:
Edmund Lian <no.spam@address.com> writes:
> Answering my own question: kind of. The problem with custom aggregates
> is that they need to be used with a "group by" clause, and this means
> that the select cannot return columns that are not aggregates of some
> kind. What I'm trying to return are rows that are a combination of
> columns and aggregates.

This strikes me as evidence of fuzzy thinking.  What sets of rows are
the aggregates taken over?  Which column values within those sets of
rows do you expect the non-aggregated column references to return?
If the columns aren't the ones grouped by, seems like you have an
inherently undefined result.

If you know for some reason that there will be only one unique value
of a column in a grouped row set, or you don't actually much care which
one you get, then you could use MIN() or MAX() on the column reference
to make it look like a kosher query.
        regards, tom lane


Re: Denormalizing during select

From
Greg Stark
Date:
Rod Taylor <rbt@rbt.ca> writes:

> I've been trying to figure out how to give a running total (similar
> issue I think).

Running totals are a "hard problem". They certainly cannot be solved using
aggregates. They're similar to the ranking problem of assigning a sequential
number to each item within each group.

The problem is they share certain properties of aggregate functions, namely
that they require persistent state storage and a state transition function.
But they definitely aren't aggregate functions in that they return a value for
every record, not one for the whole group.

I'm not even clear how to write an embedded (plpgsql or perl or python)
function, since I'm not clear how to allocate space for the state that will be
available for each call on each record but independent from other calls in the
same query. You have to be able to handle two running totals at the same time.

Note that running totals are not very sql-ish. Since sql deals in unordered
sets a running total is pretty ill-defined. It would have to be calculated
after the sort operation or else require you to sort the input tables in a
subquery or something. 

To write a proper well-defined sql-ish query for running totals you would have
to do the very inefficient:

select employee_id, salary,       (select count(*)    from employees x where x.salary < employee.salary) as
salary_rank,     (select sum(salary) from employees x where x.salary < employee.salary) as running_total from
employeesorderby salary desc
 

Finding a way to transform that into the single-scan plan that's obviously the
right way to execute it would be really cool but seems pretty far-fetched. I
don't think any database is capable of it today.

-- 
greg



Re: Denormalizing during select

From
Edmund Lian
Date:
Jeff and Josh,

I found this example in "Practical PostgreSQL"... will it do the job?

"""
The following example defines an aggregate function named sum(), for
use with the text data type. This aggregate calls the
textcat(text,text) function built into PostgreSQL to return a
concatenated "sum" of all the text found in its input values:

booktown=# CREATE AGGREGATE sum ( BASETYPE = text,
booktown(#                        SFUNC = textcat,
booktown(#                        STYPE = text,
booktown(#                        INITCOND = '' );
CREATE
booktown=# SELECT sum(title || ' ') FROM books WHERE title ~ '^L';             sum
-------------------------------Little Women Learning Python
(1 row)

"""

...Edmund.


Re: Denormalizing during select

From
no.spam@address.com
Date:
On Tue, 25 Feb 2003 01:22:42 +0000 (UTC), josh@agliodbs.com (Josh
Berkus) wrote:

>Can you post the start of this thread?  I think there's a more
>efficient way using custom aggregates, but somhow I never 
>received the original question.

I'm hoping there is... somehow, using a function to loop over every
row seems a bit inefficient? Here's my original question...

I have two tables (A and B) in a one-to-many relationship. When
retrieving data from one table (A), I need to denormalize the tables
during a select so that values of a column in the many table (B)
appear as a list in one field of the output. E.g.:

create table one_table (   name     varchar(20),   age      integer,   --   constraint one_table_pk       primary key
(name)
);

create table many_table (   name     varchar(20),   property varchar(15),   --   constraint many_table_pk       primary
key(name, property),   constraint many_table_fk       foreign key (name) references one_table (name)
 
);

So one_table could contain:   name            age   ----            ---   Mickey Mouse    50   Donald Duck     60
Goofy          45
 

and many_table could contain:   name            property   ----            --------   Mickey Mouse    Small   Mickey
Mouse   Black   Mickey Mouse    Squeaks   Donald Duck     Quacks
 


The query I'm trying to write would have as output something like:
   name           age   properties   ----           ---   ----------   Mickey Mouse   50    Small, Black, Squeaks
DonaldDuck    60    Quacks   Goofy          45    <null>
 


Could somebody please help point me in the right direction? I imagine
this is a common need...

Thanks,

...Edmund.


Convert a text list to text array? Was: Denormalizing during select

From
Guy Fraser
Date:
The email at the bottom gave me an idea, but it doesn't quite work:

CREATE AGGREGATE accumulate(  BASETYPE = text,  SFUNC = textcat,  STYPE = text,  INITCOND = '' );
--
SELECT ('{' || ltrim(accumulate(',' || tablename),',') || '}') as cruft FROM pg_tables  WHERE hasindexes = 'f';
           cruft
--------------------------- {pg_xactlock,pg_listener}
(1 row)

This produces somthing that looks like it could be able to be converted into 
an array but I cant figure out how to make it work.



Guy

Edmund Lian wrote:
> Jeff and Josh,
> 
> I found this example in "Practical PostgreSQL"... will it do the job?
> 
> """
> The following example defines an aggregate function named sum(), for
> use with the text data type. This aggregate calls the
> textcat(text,text) function built into PostgreSQL to return a
> concatenated "sum" of all the text found in its input values:
> 
> booktown=# CREATE AGGREGATE sum ( BASETYPE = text,
> booktown(#                        SFUNC = textcat,
> booktown(#                        STYPE = text,
> booktown(#                        INITCOND = '' );
> CREATE
> booktown=# SELECT sum(title || ' ') FROM books WHERE title ~ '^L';
...snip...



Re: Convert a text list to text array? Was: Denormalizing during select

From
Greg Stark
Date:
You might be able to adapt the code in contrib/intagg to handle text.
I'm not sure what would be involved in doing so.

-- 
greg



Re: Convert a text list to text array? Was: Denormalizing during

From
Joe Conway
Date:
Guy Fraser wrote:
> The email at the bottom gave me an idea, but it doesn't quite work:
> 
> CREATE AGGREGATE accumulate(
>   BASETYPE = text,
>   SFUNC = textcat,
>   STYPE = text,
>   INITCOND = '' );
> -- 
> SELECT ('{' || ltrim(accumulate(',' || tablename),',') || '}') as cruft
>  FROM pg_tables
>   WHERE hasindexes = 'f';
> 
>            cruft
> ---------------------------
>  {pg_xactlock,pg_listener}
> (1 row)
> 
> This produces somthing that looks like it could be able to be converted 
> into an array but I cant figure out how to make it work.
> 

If I understand correctly, the function array_accum() distributed with 
PL/R can do exactly what you're looking for (you need the function, but 
you don't necessarily need to use PL/R):

CREATE OR REPLACE FUNCTION array_accum (_name, name)
RETURNS name[]
AS '$libdir/plr','array_accum'
LANGUAGE 'C';

CREATE AGGREGATE accumulate (  sfunc = array_accum,  basetype = name,  stype = _name
);

regression=# SELECT accumulate(tablename) as cruft FROM pg_tables WHERE 
tablename LIKE 'c%';                 cruft
--------------------------------------- {connectby_int,connectby_text,ct,cth}
(1 row)


See:  http://www.joeconway.com/plr/doc/plr-aggregate-funcs.html
and download at:  http://www.joeconway.com/plr/

HTH,

Joe



Re: Convert a text list to text array? Was: Denormalizing during

From
Guy Fraser
Date:
Thanks a lot

Joe Conway wrote:

> Guy Fraser wrote:
>
>> The email at the bottom gave me an idea, but it doesn't quite work:
>>
>> CREATE AGGREGATE accumulate(
>>   BASETYPE = text,
>>   SFUNC = textcat,
>>   STYPE = text,
>>   INITCOND = '' );
>> -- 
>> SELECT ('{' || ltrim(accumulate(',' || tablename),',') || '}') as cruft
>>  FROM pg_tables
>>   WHERE hasindexes = 'f';
>>
>>            cruft
>> ---------------------------
>>  {pg_xactlock,pg_listener}
>> (1 row)
>>
>> This produces somthing that looks like it could be able to be 
>> converted into an array but I cant figure out how to make it work.
>>
>
> If I understand correctly, the function array_accum() distributed with 
> PL/R can do exactly what you're looking for (you need the function, 
> but you don't necessarily need to use PL/R):
>
> CREATE OR REPLACE FUNCTION array_accum (_name, name)
> RETURNS name[]
> AS '$libdir/plr','array_accum'
> LANGUAGE 'C';
>
> CREATE AGGREGATE accumulate (
>   sfunc = array_accum,
>   basetype = name,
>   stype = _name
> );
>
> regression=# SELECT accumulate(tablename) as cruft FROM pg_tables 
> WHERE tablename LIKE 'c%';
>                  cruft
> ---------------------------------------
>  {connectby_int,connectby_text,ct,cth}
> (1 row)
>
>
> See:
>   http://www.joeconway.com/plr/doc/plr-aggregate-funcs.html
> and download at:
>   http://www.joeconway.com/plr/
>
> HTH,
>
> Joe
>
>