Thread: Create custom aggregate function and custom sfunc
hi,
i have to create a aggregate function which evaluates a maximum text value but with some conditions i have to take care of.
is there a way to access a value set of each group?
e.g.:
customer ; seg
111 ; L1
111 ; L2
111 ; L1
222 ; L3
222 ; L3
222 ; L2
the result should look like this:
111: L1 - because L1 is higher than L2 and ratio of L1 : L2 is 2 : 1
222: L3 - because L3 is higher than L2 and ratio of L3 : L2 is 2 : 1
i hope you know what i mean.
ty
Jasmin Dizdarevic wrote: > hi, > > i have to create a aggregate function which evaluates a maximum text > value but with some conditions i have to take care of. > is there a way to access a value set of each group? > e.g.: > > customer ; seg > 111 ; L1 > 111 ; L2 > 111 ; L1 > 222 ; L3 > 222 ; L3 > 222 ; L2 > > the result should look like this: > > 111: L1 - because L1 is higher than L2 and ratio of L1 : L2 is 2 : 1 > 222: L3 - because L3 is higher than L2 and ratio of L3 : L2 is 2 : 1 > > i hope you know what i mean. > > ty > You don't have to create an aggregate function. I have similar problem where the part numbers have to order based on the contents and the first thing you have to do is split it apart then set the ordering you want. This gives you an idea of what you can do and what the results look like. If the data in the table is laid out as you describe with where 111 and L1 are different fields its very easy and you don't have to create an aggregate function Select '111', 'L1', regexp_replace( 'L1', '[1-9 `]+', ''), regexp_replace( 'L1', '[a-zA-Z `]+', '')::integer union Select '111', 'L3', regexp_replace( 'L3', '[1-9 `]+', ''), regexp_replace( 'L3', '[a-zA-Z `]+', '')::integer union Select'111', 'L2', regexp_replace( 'L2', '[1-9 `]+', ''), regexp_replace( 'L2', '[a-zA-Z `]+', '')::integer order by 3, 4 if the data is 111;L1 in a single field its still very easy. Example like so Select split_part('111;L1', ';',1), split_part('111;L1', ';',2), regexp_replace( split_part('111;L1', ';',2), '[1-9 `]+',''), regexp_replace( split_part('111;L1', ';',2), '[a-zA-Z `]+', '')::integer union Select split_part('111;L3', ';',1), split_part('111;L3', ';',2), regexp_replace( split_part('111;L3', ';',2), '[1-9 `]+',''), regexp_replace( split_part('111;L3', ';',2), '[a-zA-Z `]+', '')::integer union Select split_part('111;L2', ';',1), split_part('111;L2', ';',2), regexp_replace( split_part('111;L2', ';',2), '[1-9 `]+',''), regexp_replace( split_part('111;L2', ';',2), '[a-zA-Z `]+', '')::integer order by 3, 4 desc
On Thu, Jul 2, 2009 at 3:48 PM, Jasmin Dizdarevic<jasmin.dizdarevic@gmail.com> wrote: > customer ; seg > 111 ; L1 > 111 ; L2 > 111 ; L1 > 222 ; L3 > 222 ; L3 > 222 ; L2 > > the result should look like this: > > 111: L1 - because L1 is higher than L2 and ratio of L1 : L2 is 2 : 1 > 222: L3 - because L3 is higher than L2 and ratio of L3 : L2 is 2 : 1 Are you just looking for the most frequent seg for each customer? select distinct on (customer) customer,seg from (select customer, seg, count(*) as n from tab group by seg) order by customer,n desc That doesn't give the ratios though. -- greg http://mit.edu/~gsstark/resume.pdf
Hello, Le 2/07/09 23:21, Greg Stark a écrit : > On Thu, Jul 2, 2009 at 3:48 PM, Jasmin > Dizdarevic<jasmin.dizdarevic@gmail.com> wrote: >> customer ; seg >> 111 ; L1 >> 111 ; L2 >> 111 ; L1 >> 222 ; L3 >> 222 ; L3 >> 222 ; L2 >> >> the result should look like this: >> >> 111: L1 - because L1 is higher than L2 and ratio of L1 : L2 is 2 : 1 >> 222: L3 - because L3 is higher than L2 and ratio of L3 : L2 is 2 : 1 > > Are you just looking for the most frequent seg for each customer? > > select distinct on (customer) customer,seg > from (select customer, seg, count(*) as n from tab group by seg) > order by customer, n desc > > That doesn't give the ratios though. > Some errors would occur with the above query from within the subquery because of a projection on customer column that is not a grouped column (neither an uniquely identified column by seg--seg is assumed not to be a primary key here according to the given examples of value). By the way, the ordered column n is not a member of the projected columns. Nevertheless, in the same direction as Greg Stark, the following query would approach the target result: SELECT T5A.customer, T5A.seg FROM (SELECT T2.customer, T2.seg, COUNT(*) AS nbFROM cst T2GROUP BY T2.customer, T2.seg ) T5A INNER JOIN (SELECT T4.customer, MAX(T4.nb) AS maxNbFROM ( SELECT T2B.customer, T2B.seg, COUNT(*) AS nb FROM cstT2B GROUP BY T2B.customer, T2B.seg) T4GROUP BY T4.customer ) T5B ON T5A.customer = T5B.customer AND T5A.nb = T5B.maxNb where 'cst' is the reference table including 'customer' and 'seg' columns. This query considers a join between two tables: - the first table, aliased T5A, counts the multiplicity for each couple of 'customer' and 'seg'; - the secund table, aliased T5B, reveals the highest multiplicity for each 'customer' with regard to each 'seg' attached to customer. Then the join only retains the couples of 'customer' and 'seg' whom multiplicity equals the higher for the 'customer' of the current couple. Each multiplicity may be easily added to the resulting records by spanning the projected columns with T5A.nb column. A sort of ratio may also be added; one aggregated column may first be added to table T4 as SUM(T4.nb) -- that is, sum of multiplicity for each customer; then the resulting join may compute the ratio between the highest multiplicity and the sum of multiplicity for each customer, in addition to the seg (or list of seg) for which the multiplicity is the highest. Here is the modified query: SELECT T5A.customer, T5A.seg, T5A.nb/T5B.sumNb FROM (SELECT T2.customer, T2.seg, COUNT(*) AS nbFROM cst T2GROUP BY T2.customer, T2.seg ) T5A INNER JOIN (SELECT T4.customer, MAX(T4.nb) AS maxNb, SUM(T4.nb) AS sumNbFROM ( SELECT T2B.customer, T2B.seg, COUNT(*)AS nb FROM cst T2B GROUP BY T2B.customer, T2B.seg) T4GROUP BY T4.customer ) T5B ON T5A.customer = T5B.customer AND T5A.nb = T5B.maxNb As is, this query may result to multiple couples of customer and seg with the same customer value when many seg relie as many times as many others for the current customer. This "edge effect" may be avoided in many ways depending on the original purpose. Regards. -- nha / Lyon / France.
Hi,
thank you everybody for your help. The classy solution posted by nha works great!
Regards
Jasmin
2009/7/3 nha <lyondif02@free.fr>
Hello,
Le 2/07/09 23:21, Greg Stark a écrit :Some errors would occur with the above query from within the subquery because of a projection on customer column that is not a grouped column (neither an uniquely identified column by seg--seg is assumed not to be a primary key here according to the given examples of value). By the way, the ordered column n is not a member of the projected columns.On Thu, Jul 2, 2009 at 3:48 PM, Jasmin
Dizdarevic<jasmin.dizdarevic@gmail.com> wrote:customer ; seg
111 ; L1
111 ; L2
111 ; L1
222 ; L3
222 ; L3
222 ; L2
the result should look like this:
111: L1 - because L1 is higher than L2 and ratio of L1 : L2 is 2 : 1
222: L3 - because L3 is higher than L2 and ratio of L3 : L2 is 2 : 1
Are you just looking for the most frequent seg for each customer?
select distinct on (customer) customer,seg
from (select customer, seg, count(*) as n from tab group by seg)
order by customer, n desc
That doesn't give the ratios though.
Nevertheless, in the same direction as Greg Stark, the following query would approach the target result:
SELECT T5A.customer, T5A.seg
FROM (
SELECT T2.customer, T2.seg, COUNT(*) AS nb
FROM cst T2
GROUP BY T2.customer, T2.seg
) T5A INNER JOIN (
SELECT T4.customer, MAX(T4.nb) AS maxNb
FROM (
SELECT T2B.customer, T2B.seg, COUNT(*) AS nb
FROM cst T2B
GROUP BY T2B.customer, T2B.seg
) T4
GROUP BY T4.customer
) T5B
ON T5A.customer = T5B.customer
AND T5A.nb = T5B.maxNb
where 'cst' is the reference table including 'customer' and 'seg' columns. This query considers a join between two tables:
- the first table, aliased T5A, counts the multiplicity for each couple of 'customer' and 'seg';
- the secund table, aliased T5B, reveals the highest multiplicity for each 'customer' with regard to each 'seg' attached to customer.
Then the join only retains the couples of 'customer' and 'seg' whom multiplicity equals the higher for the 'customer' of the current couple.
Each multiplicity may be easily added to the resulting records by spanning the projected columns with T5A.nb column. A sort of ratio may also be added; one aggregated column may first be added to table T4 as SUM(T4.nb) -- that is, sum of multiplicity for each customer; then the resulting join may compute the ratio between the highest multiplicity and the sum of multiplicity for each customer, in addition to the seg (or list of seg) for which the multiplicity is the highest.
Here is the modified query:
SELECT T5A.customer, T5A.seg, T5A.nb/T5B.sumNb
FROM (
SELECT T2.customer, T2.seg, COUNT(*) AS nb
FROM cst T2
GROUP BY T2.customer, T2.seg
) T5A INNER JOIN (
SELECT T4.customer, MAX(T4.nb) AS maxNb, SUM(T4.nb) AS sumNb
FROM (
SELECT T2B.customer, T2B.seg, COUNT(*) AS nb
FROM cst T2B
GROUP BY T2B.customer, T2B.seg
) T4
GROUP BY T4.customer
) T5B
ON T5A.customer = T5B.customer
AND T5A.nb = T5B.maxNb
As is, this query may result to multiple couples of customer and seg with the same customer value when many seg relie as many times as many others for the current customer. This "edge effect" may be avoided in many ways depending on the original purpose.
Regards.
--
nha / Lyon / France.