Thread: Create custom aggregate function and custom sfunc

Create custom aggregate function and custom sfunc

From
Jasmin Dizdarevic
Date:
 hi,
 
i have to create a aggregate function which evaluates a maximum text value but with some conditions i have to take care of.
is there a way to access a value set of each group?
e.g.:
 
customer ; seg
111 ; L1
111 ; L2
111 ; L1
222 ; L3
222 ; L3
222 ; L2
 
the result should look like this:
 
111: L1 - because L1 is higher than L2 and ratio of L1 : L2 is 2 : 1
222: L3 - because L3 is higher than L2 and ratio of L3 : L2 is 2 : 1
 
i hope you know what i mean.
 
ty
 
 

Re: Create custom aggregate function and custom sfunc

From
justin
Date:
Jasmin Dizdarevic wrote:
>  hi,
>  
> i have to create a aggregate function which evaluates a maximum text 
> value but with some conditions i have to take care of.
> is there a way to access a value set of each group?
> e.g.:
>  
> customer ; seg
> 111 ; L1
> 111 ; L2
> 111 ; L1
> 222 ; L3
> 222 ; L3
> 222 ; L2
>  
> the result should look like this:
>  
> 111: L1 - because L1 is higher than L2 and ratio of L1 : L2 is 2 : 1
> 222: L3 - because L3 is higher than L2 and ratio of L3 : L2 is 2 : 1
>  
> i hope you know what i mean.
>  
> ty
>  
You don't have to create an aggregate function.  I have similar problem 
where the part numbers have to order based on the contents and the first 
thing you have to do is split it apart then set the ordering you want.  
This gives you an idea of what you can do and what the results look like. 

If the data in the table is laid out as you describe with where 111 and 
L1  are different fields its very easy and you don't have to create an 
aggregate function  

Select '111', 'L1',   regexp_replace( 'L1', '[1-9 `]+', ''),   regexp_replace( 'L1', '[a-zA-Z `]+', '')::integer
union
Select '111', 'L3',   regexp_replace( 'L3', '[1-9 `]+', ''),   regexp_replace( 'L3', '[a-zA-Z `]+', '')::integer
union
Select'111', 'L2',   regexp_replace( 'L2', '[1-9 `]+', ''),   regexp_replace( 'L2', '[a-zA-Z `]+', '')::integer

order by 3, 4


if the data is 111;L1 in a single field its still very easy. Example 
like so

Select split_part('111;L1', ';',1),   split_part('111;L1', ';',2),   regexp_replace( split_part('111;L1', ';',2), '[1-9
`]+',''),   regexp_replace( split_part('111;L1', ';',2), '[a-zA-Z `]+', '')::integer
 
union
Select split_part('111;L3', ';',1),   split_part('111;L3', ';',2),   regexp_replace( split_part('111;L3', ';',2), '[1-9
`]+',''),   regexp_replace( split_part('111;L3', ';',2), '[a-zA-Z `]+', '')::integer
 
union
Select split_part('111;L2', ';',1),   split_part('111;L2', ';',2),   regexp_replace( split_part('111;L2', ';',2), '[1-9
`]+',''),   regexp_replace( split_part('111;L2', ';',2), '[a-zA-Z `]+', 
 
'')::integer

order by 3, 4 desc



Re: Create custom aggregate function and custom sfunc

From
Greg Stark
Date:
On Thu, Jul 2, 2009 at 3:48 PM, Jasmin
Dizdarevic<jasmin.dizdarevic@gmail.com> wrote:
> customer ; seg
> 111 ; L1
> 111 ; L2
> 111 ; L1
> 222 ; L3
> 222 ; L3
> 222 ; L2
>
> the result should look like this:
>
> 111: L1 - because L1 is higher than L2 and ratio of L1 : L2 is 2 : 1
> 222: L3 - because L3 is higher than L2 and ratio of L3 : L2 is 2 : 1

Are you just looking for the most frequent seg for each customer?

select distinct on (customer) customer,seg  from (select customer, seg, count(*) as n from tab group by seg) order by
customer,n desc 

That doesn't give the ratios though.

--
greg
http://mit.edu/~gsstark/resume.pdf


Re: Create custom aggregate function and custom sfunc

From
nha
Date:
Hello,

Le 2/07/09 23:21, Greg Stark a écrit :
> On Thu, Jul 2, 2009 at 3:48 PM, Jasmin
> Dizdarevic<jasmin.dizdarevic@gmail.com>  wrote:
>> customer ; seg
>> 111 ; L1
>> 111 ; L2
>> 111 ; L1
>> 222 ; L3
>> 222 ; L3
>> 222 ; L2
>>
>> the result should look like this:
>>
>> 111: L1 - because L1 is higher than L2 and ratio of L1 : L2 is 2 : 1
>> 222: L3 - because L3 is higher than L2 and ratio of L3 : L2 is 2 : 1
>
> Are you just looking for the most frequent seg for each customer?
>
> select distinct on (customer) customer,seg
>     from (select customer, seg, count(*) as n from tab group by seg)
>    order by customer, n desc
>
> That doesn't give the ratios though.
>

Some errors would occur with the above query from within the subquery 
because of a projection on customer column that is not a grouped column 
(neither an uniquely identified column by seg--seg is assumed not to be 
a primary key here according to the given examples of value). By the 
way, the ordered column n is not a member of the projected columns.

Nevertheless, in the same direction as Greg Stark, the following query 
would approach the target result:

SELECT T5A.customer, T5A.seg
FROM (SELECT T2.customer, T2.seg, COUNT(*) AS nbFROM cst T2GROUP BY T2.customer, T2.seg
) T5A INNER JOIN (SELECT T4.customer, MAX(T4.nb) AS maxNbFROM (    SELECT T2B.customer, T2B.seg, COUNT(*) AS nb    FROM
cstT2B    GROUP BY T2B.customer, T2B.seg) T4GROUP BY T4.customer
 
) T5B
ON T5A.customer = T5B.customer
AND T5A.nb = T5B.maxNb

where 'cst' is the reference table including 'customer' and 'seg' 
columns. This query considers a join between two tables:
- the first table, aliased T5A, counts the multiplicity for each couple 
of 'customer' and 'seg';
- the secund table, aliased T5B, reveals the highest multiplicity for 
each 'customer' with regard to each 'seg' attached to customer.
Then the join only retains the couples of 'customer' and 'seg' whom 
multiplicity equals the higher for the 'customer' of the current couple.

Each multiplicity may be easily added to the resulting records by 
spanning the projected columns with T5A.nb column. A sort of ratio may 
also be added; one aggregated column may first be added to table T4 as 
SUM(T4.nb) -- that is, sum of multiplicity for each customer; then the 
resulting join may compute the ratio between the highest multiplicity 
and the sum of multiplicity for each customer, in addition to the seg 
(or list of seg) for which the multiplicity is the highest.

Here is the modified query:

SELECT T5A.customer, T5A.seg, T5A.nb/T5B.sumNb
FROM (SELECT T2.customer, T2.seg, COUNT(*) AS nbFROM cst T2GROUP BY T2.customer, T2.seg
) T5A INNER JOIN (SELECT T4.customer, MAX(T4.nb) AS maxNb, SUM(T4.nb) AS sumNbFROM (    SELECT T2B.customer, T2B.seg,
COUNT(*)AS nb    FROM cst T2B    GROUP BY T2B.customer, T2B.seg) T4GROUP BY T4.customer
 
) T5B
ON T5A.customer = T5B.customer
AND T5A.nb = T5B.maxNb

As is, this query may result to multiple couples of customer and seg 
with the same customer value when many seg relie as many times as many 
others for the current customer. This "edge effect" may be avoided in 
many ways depending on the original purpose.

Regards.

--
nha / Lyon / France.


Re: Create custom aggregate function and custom sfunc

From
Jasmin Dizdarevic
Date:
Hi,
 
thank you everybody for your help. The classy solution posted by nha works great!
 
Regards
Jasmin

2009/7/3 nha <lyondif02@free.fr>
Hello,

Le 2/07/09 23:21, Greg Stark a écrit :

On Thu, Jul 2, 2009 at 3:48 PM, Jasmin
Dizdarevic<jasmin.dizdarevic@gmail.com>  wrote:
customer ; seg
111 ; L1
111 ; L2
111 ; L1
222 ; L3
222 ; L3
222 ; L2

the result should look like this:

111: L1 - because L1 is higher than L2 and ratio of L1 : L2 is 2 : 1
222: L3 - because L3 is higher than L2 and ratio of L3 : L2 is 2 : 1

Are you just looking for the most frequent seg for each customer?

select distinct on (customer) customer,seg
   from (select customer, seg, count(*) as n from tab group by seg)
  order by customer, n desc

That doesn't give the ratios though.


Some errors would occur with the above query from within the subquery because of a projection on customer column that is not a grouped column (neither an uniquely identified column by seg--seg is assumed not to be a primary key here according to the given examples of value). By the way, the ordered column n is not a member of the projected columns.

Nevertheless, in the same direction as Greg Stark, the following query would approach the target result:

SELECT T5A.customer, T5A.seg
FROM (
       SELECT T2.customer, T2.seg, COUNT(*) AS nb
       FROM cst T2
       GROUP BY T2.customer, T2.seg
) T5A INNER JOIN (
       SELECT T4.customer, MAX(T4.nb) AS maxNb
       FROM (
               SELECT T2B.customer, T2B.seg, COUNT(*) AS nb
               FROM cst T2B
               GROUP BY T2B.customer, T2B.seg
       ) T4
       GROUP BY T4.customer
) T5B
ON T5A.customer = T5B.customer
AND T5A.nb = T5B.maxNb

where 'cst' is the reference table including 'customer' and 'seg' columns. This query considers a join between two tables:
- the first table, aliased T5A, counts the multiplicity for each couple of 'customer' and 'seg';
- the secund table, aliased T5B, reveals the highest multiplicity for each 'customer' with regard to each 'seg' attached to customer.
Then the join only retains the couples of 'customer' and 'seg' whom multiplicity equals the higher for the 'customer' of the current couple.

Each multiplicity may be easily added to the resulting records by spanning the projected columns with T5A.nb column. A sort of ratio may also be added; one aggregated column may first be added to table T4 as SUM(T4.nb) -- that is, sum of multiplicity for each customer; then the resulting join may compute the ratio between the highest multiplicity and the sum of multiplicity for each customer, in addition to the seg (or list of seg) for which the multiplicity is the highest.

Here is the modified query:

SELECT T5A.customer, T5A.seg, T5A.nb/T5B.sumNb
FROM (
       SELECT T2.customer, T2.seg, COUNT(*) AS nb
       FROM cst T2
       GROUP BY T2.customer, T2.seg
) T5A INNER JOIN (
       SELECT T4.customer, MAX(T4.nb) AS maxNb, SUM(T4.nb) AS sumNb
       FROM (
               SELECT T2B.customer, T2B.seg, COUNT(*) AS nb
               FROM cst T2B
               GROUP BY T2B.customer, T2B.seg
       ) T4
       GROUP BY T4.customer
) T5B
ON T5A.customer = T5B.customer
AND T5A.nb = T5B.maxNb

As is, this query may result to multiple couples of customer and seg with the same customer value when many seg relie as many times as many others for the current customer. This "edge effect" may be avoided in many ways depending on the original purpose.

Regards.

--
nha / Lyon / France.